J. Korbicz • J.M. Kokielny 
Z. Kowaiczuk • W. Cholewa (Eds.) 



Fault Diagnosis 

Models, Artificial Intelligence, 

Applications V0.1 




Springer 




J. Korbicz • J.M. Koscielny • Z. Kowalczuk • W. Cholewa (Eds.) 



Fault Diagnosis 




Spfinger-Verlag Berlin Heidelberg GmbH 



□ 

Engineering 



ONLINE LIBRARY 



springeronline.com 




Jozef Korbicz • Jan M. Koscielny 

Zdzisiaw Kowalczuk • Wojciech Cholewa (Eds.) 



Fault Diagnosis 

Models, Artificial Intelligence, Applications 



With 312 Figures 




Springer 




Sponsoring Editor 

Prof. Janusz Kacprzyk 
Systems Research Institute 
Polish Academy of Sciences 
ul. Newelska 6 
01-447 Warsaw, Poland 
kacprzyk@ibspan.waw.pl 



Prof. Jozef Korbicz 
University of Zielona Gora 
Institute of Control and 
Computation Engineering 
ul. Podgorna 50 
65-246 Zielona Gora, Poland 



Prof Jan M. Koscielny 

Warsaw University of Technology 

Institute of Automatic Control and Robotics 

ul. Chodkiewicza 8 

02-525 Warszawa, Poland 



Prof Zdzislaw Kowalczuk 
Technical University of Gdansk 
Dept, of Automatic Control W.E.T.I. 
ul. Narutowicza 11/12 
80-952 Gdansk, Poland 



Prof Wojciech Cholewa 
Silesian University of Technology Gliwice 
Dept, of Fundamentals of Machine Design 
Konarskiego 18a 
44-100 Gliwice, Poland 



Cataloging-in-Publication Data applied for 

Bibliographic information published by Die Deutsche Bibliothek 

Die Deutsche Bibliothek lists this publication in the Deutsche Nationalbibliografie; 

detailed bibliographic data is available in the Internet at <http://dnb.dd.de> 

ISBN 978-3-642-62199-4 ISBN 978-3-642-18615-8 (eBook) 

DOI 10.1007/978-3-642-18615-8 

This work is subject to copyright. All rights are reserved, whether the whole or part of the material is 
concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, 
broadcasting, reproduction on microfilm or in other ways, and storage in data banks. Duplication of 
this publication or parts thereof is permitted only under the provisions of the German Copyright Law 
of September 9, 1965, in its current version, and permission for use must always be obtained from 
Springer -Verlag. Violations are liable for prosecution under German Copyright Law. 



springeronline.com 

© Springer- Verlag Berlin Heidelberg 2004 

Originally published by Springer-Veriag Berlin Heidelberg in 2004 

Softcover reprint of the hardcover 1st edition 2004 

The use of general descriptive names, registered names, trademarks, etc. in this publication does not 
imply, even in the absence of a specific statement, that such names are exempt from the relevant 
protective laws and regulations and therefore free for general use. 

Typesetting: Digital data supplied by authors 
Cover-Design: medio Technologies AG, Berlin 
Printed on acid-free paper 62/3020 Rw 5 4 3 2 10 




Foreword 



All real systems in nature - physical, biological and engineering ones - can 
malfunction and fail due to faults in their components. Logically, the chances 
for malfunctions increase with the systems’ complexity. The complexity of 
engineering systems is permanently growing due to their growing size and 
the degree of automation, and accordingly increasing is the danger of fail- 
ing and aggravating their impact for man and the environment. Therefore, 
in the design and operation of engineering systems, increased attention has 
to be paid to reliability, safety and fault tolerance. But it is obvious that, 
compared to the high standard of perfection that nature has achieved with 
its self-healing and self-repairing capabilities in complex biological organisms, 
fault management in engineering systems is far behind the standards of their 
technological achievements; it is still in its infancy, and tremendous work is 
left to be done. 

In technical control systems, defects may happen in sensors, actuators, 
components of the controlled object - the plant, or in the hardware or soft- 
ware of the control framework. Such defects in the components may develop 
into a failure of the whole system. This effect can easily be amplified by the 
closed loop, but the closed loop may also hide an incipient fault from be- 
ing observed until a situation has occurred in which the failing of the whole 
system has become unavoidable. Even designing the closed loop as robust or 
reliable (using robust or reliable control algorithms) cannot solve the prob- 
lem in full. It may help to make the closed loop continue its mission with the 
desired or a tolerable degraded performance despite the presence of faults, 
but when the faulty device continues to malfunction, it may cause damage 
due to the persistent impact of the faults on man and the environment (e.g., 
leakage in gas tanks or in oil pipes, etc.). So, both robust control and re- 
liable control exploiting the available hardware or software redundancy of 
the system may be efficient ways to maintain the functionality of the con- 
trol system, but it cannot guarantee safety or environmental compatibility. 
Realistic fault management has to guarantee dependability including both 
reliability and safety. Dependability has become a fundamental requirement 
in industrial automation, and a cost-effective way to provide dependability is 
Fault-Tolerant Control (FTC). The key issue of FTC is to prevent local faults 
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from developing into a system failure that can end the mission of the system 
and cause safety hazards for man and the environment. Because of its in- 
creasing importance in industrial automation, FTC has become an emerging 
topic of both control theory and industrial applications. 

Fault management in engineering systems has many facets. Safety-critical 
systems, where no failure can be tolerated, need redundant hardware to ac- 
complish fault recovery. Fail- operational systems are insensitive to any single 
component fault. Fail-safe systems perform a controlled shut-down to a safe 
state with graceful degradation when a critical fault has been detected. Ro- 
bust control ensures the stability or pre-assigned performance of the control 
system in the presence of continuous faults and reliable control does so in 
the event of discrete faults. Generally speaking, fault-tolerant control pro- 
vides online supervision of the system and appropriate remedial actions to 
prevent faults from developing into a failure of the whole system. In advanced 
FTC systems, this is attained with the aid of Fault Diagnosis (FD) in order 
to detect the faulty components, associated with an appropriate system re- 
configuration. But not only has FD become a key issue for FTC, it is also the 
core of Fault-Tolerant Measurement (FTM), the goal of which is to ensure the 
reliability of the measurements in a sensor platform by replacing erroneous 
sensor readings by reconstructed signals due to the existing analytical redun- 
dancy. Last but not least, fault diagnosis has become a basic tool for offline 
tasks such as condition-based maintenance and repair carried out according 
to the information from early fault monitoring. 

The backbone of modern fault diagnosis systems is the model-based ap- 
proach, where the model is used as a reference and contains the analytical 
redundancy. Making use of dynamic models of the system under consider- 
ation provides the most powerful approach; it allows us to determine the 
time, size and cause of even small faults during all phases of dynamic system 
operations. 

The classical approach to model-based fault diagnosis makes use of func- 
tional models in terms of an analytical (‘‘parametric”) representation. A fun- 
damental difficulty with analytical models is that there are always modeling 
uncertainties due to unmodeled disturbances, simplifications, idealizations, 
linearizations and parameter mismatches between the real system and its 
model, which are basically unavoidable in the mathematical modeling of real 
systems. They may be subsumed under the term unknown inputs and are not 
mission-critical. But they can obscure small faults, and if they are misinter- 
preted as faults they cause false alarms which can make an FD system totally 
useless. Hence, the most essential requirement for an analytical model-based 
FD algorithm is robustness w. r. t. the different kinds of uncertainties. Sur- 
prisingly, much less attention has been paid to the use of qualitative models 
and artificial (or computational) intelligence, in which case the parameter un- 
certainty problem does inherently not occur. The appeal of these approaches 
lies in the fact that qualitative models permit accurate FD decision making 
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even under imperfect system modeling and imprecise measurements. More- 
over, qualitative models may be less complex than comparably powerful an- 
alytical models. 

The book on hand is one of the few comprehensive works on the market 
covering the fundamentals of model-based fault diagnosis, which has become 
an emerging discipline of modern control engineering, in a very wide context 
including both analytical and non-analytical (fuzzy and neural) models as 
well as approaches based on artificial and computational intelligence. It is a 
multi-authored book, where the editors are well acknowledged experts in the 
field, not only in the theoretical domain but also with respect to industrial 
applications. The latter finds expression in the fact that a substantial part of 
the text is dedicated to practical applications. 

The text is divided into three major parts: Methodology, Artificial Intel- 
ligence, and Applications. The methodological part provides the theoretical 
foundation of the relevant model-based approaches to fault diagnosis. It in- 
cludes an outline of the different types of models used for fault diagnosis, 
the basics of fault detection and isolation, the methods of signal analysis and 
the control-theory-based design techniques for residual generation and evalua- 
tion, mainly dealing with observers and Kalman filters. The two final chapters 
of Part 1 are dedicated to the optimal design of fault detection filters either 
by using observers with eigenstructure assignment or filters with H infinity 
optimization. Part 2 gives substantial space for a very broad and fundamental 
treatment of the methods of artificial and computational intelligence as far 
as they are important for fault diagnosis. Among the major topics treated 
with respect to the design of diagnostic systems there are evolutionary meth- 
ods, artificial neural networks, the use of Wiener and Hammerstein models 
and the fuzzy logic approach. But there are also chapters dealing with the 
design of unknown input observers for non-linear systems, the multi-criteria 
optimization of diagnostic observers and the pattern recognition approach. 
The last section of Part 2 covers the expert system approach including select- 
ed methods of knowledge engineering and methods of diagnostic knowledge 
acquisition. The final part introduces the reader to issues of practical ap- 
plications. Among the topics considered there are algorithms for monitoring 
complex systems states, examples of industrial diagnostic systems, the appli- 
cation to adaptive tracking filtering, the detection and isolation of leakage in 
pipelines, and a selection of other typical industrial applications. 

The book provides indeed a deep grounding for all those who seek an 
introduction to the theoretical foundation and the basic approaches of fault 
diagnosis, and who want to apply fault diagnosis in an industrial environ- 
ment. Thus, it serves both undergraduate and graduate students in engineer- 
ing sciences, especially control engineering and mechanical engineering, or 
postgraduate students in systems sciences including informatics and comput- 
er sciences. It may though also be considered as a valuable reference book 
or a practical help for industrial control engineers who are in charge of the 
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improvement of the reliability and safety of their technical control systems, 
of condition-based maintenance or repair, or, in general, of the design of fault 
tolerant control systems. Therefore, the book can be strongly recommended 
to both students and practitioners in the wide field of the fault management 
of technical systems. 



Duisburg, 26 March 2003 Prof. Dr.-Ing. Dr. h. c. mult. Paul M. FRANK 

University of Duisburg-Essen, Germany 




Preface 



Fault diagnosis is becoming a field of engineering knowledge of growing im- 
portance. Its rank is determined by the constantly increasing complexity of 
contemporary industrial processes and the considerably wide range of dan- 
gers (economic, technological, biological, etc.) that may occur when a given 
process is disrupted because of a failure. It has been known for a long time 
that this area is very important as far as practical usage is concerned. Yet 
the implementation of diagnostic systems possible in present days has been 
facilitated only by the development of computer technology and the emer- 
gence of the possibility of equipping industrial facilities with measurement, 
control, monitoring and supervision systems of high computational capaci- 
ty. It is clear that fault diagnosis, including Fault Detection and Isolation 
(FDI), has become an important interdisciplinary subject in modern control 
and computer engineering, next to technical diagnosis. The continuous devel- 
opment of FDI and the interdisciplinary research into it can be observed in 
the literature since the beginning of the 1970s, especially in the proceedings 
of the IFAC symposium on Fault Detection, Supervision and Safety for Tech- 
nical Processes, SAFEPROCESS, organized every third year since 1991. Our 
Polish conferences on Diagnostics of Industrial Processes, DPP, are organized 
every second year since 1996. 

This book presents today’s state and development tendencies of fault diag- 
nosis systems, in which the model-based approach is considered fundamental 
when designing processes - the model is used as a reference and contains 
analytical redundancy. It is a multi-author book, a result of five years of 
cooperation between research groups from Polish universities, led by the ed- 
itors. Most of the authors and co-authors of the chapters are our present or 
former doctoral students. 

Taking into account the large amount of knowledge about fault diagnosis 
theory and practice presented in the book, it is divided into three major 
parts: Methodology, Artificial Intelligence and Applications. Part I focuses 
on the theoretical foundations of analytical methods of fault diagnosis based 
on mathematical modelling and control theory. Seven chapters are devoted 
to the different types of models used for FDI, a short introduction to the 




X 



Preface 



methods of signal analysis as well as the designing of observers and Kalman 
filters for residual generation and evaluation. 

Considering the growing complexity of diagnosed processes and serious 
difficulties encountered while obtaining proper mathematical models used 
in FDI, in Part II of the book the methods of integrating qualitative and 
quantitative model information are considered. Most of them are based on 
soft computing approaches, such as artificial neural networks, fuzzy and 
neuro-fuzzy techniques, expert systems and evolutionary algorithms. More- 
over, some chapters focus on designing non-linear observers using genetic 
programming, on the pattern recognition approach to fault diagnosis, and on 
the acquisition of diagnostic knowledge. 

Part III contains a review of selected applications of various diagnostic ap- 
proaches, from company systems to the achievements of the authors. Among 
them there are described monitoring and diagnostic systems for industrial 
processes, the detection and isolation of leakage in industrial pipelines, the 
diagnosis of the steam boiler of a power plant and the evaporator in a sugar 
factory, and several other applications. 

The book contains the results of research into fault diagnosis systems 
conducted by the research teams from Zielona Gora, Warsaw, Gdansk, Gli- 
wice and Cracow for almost ten years with the kind support of the State 
Committee for Scientific Research in Poland. The Zielona Gora and Warsaw 
teams were additionally supported by the European Union within the 4th 
(COPERNICUS) and the 5th (DAMADICS) Framework Programme. The 
first edition of this book in Polish was published in Poland by Wydawnictwa 
Naukowo-Techniczne, WiVT, in 2002 in Warsaw. The English language ver- 
sion is not merely a translation of the original one - many chapters include 
some significant improvements, e.g., new or extended parts and examples. 

The book will be of interest to industrial engineers and scientists as well 
as academics who wish to pursue the reliability and fault detection issues 
of safety-critical industrial processes. It is recommended to both graduate 
and postgraduate students of control and mechanical engineering as well as 
system sciences. The wide scope of the book provides them with a good 
introduction to many basic approaches of FDI, and it is also the source of 
useful bibliographical information. 

The editors are indebted to many people for their suggestions and help 
concerning this project. Our special thanks go to Ron Patton and Paul Prank, 
who first engaged our attention in fault diagnosis and who involved us in inter- 
national research projects as well as other activities. The editors and authors 
are also deeply grateful to our long-term supervisors Zdzislaw Bubnicki and 
Czeslaw Cempel, who shared with us their extensive knowledge of system sci- 
ences and technical diagnostics. They strongly supported the Polish edition 
of the book and encouraged us to prepare the English language version. 
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Special thanks go to Ms Agnieszka Rozewska from Jozef Korbicz’s office 
for her effort put tirelessly for many months, including holidays and weekends, 
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express our gratitude to Ms Beata Bukowiec and Ms Anna Mysliwa from 
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Methodology 




Chapter 1 



INTRODUCTION 



Wojciech CHOLEWA*, Jan Maciej KOSCIELNY** 



1.1. Diagnostics of processes and its fundamental tasks 

“Until quite lately the term diagnostics was invariably associated with 
medicine as its field concerning the ways of disease recognition on the 
basis of symptoms. The term is derived from the Greek word diagnosis^ 
which means recognition, whereas diagnostikos represents the ability to 
recognize. During the last ten years technical development has caused, 
on the one hand, the growth of the complexity of technical means, 
and on the other, the growth of the responsibility of tasks that are 
carried out with the use of these means. Nowadays, we are witnesses 
of the establishment and development of a new knowledge domain - 
technical diagnostics, which arises as an object of the demand of the 
users of these complex technical means. The goal of this new domain 
is to determine the broadly understood technical state of objects with 
the use of objective methods and means.” 

Such a definition of technical diagnostics and its goals was included in the 
monograph by Compel published in 1982. 

Encyclopaedia Britannica defines a diagnosis as the process of determin- 
ing the nature of a disease or disorder and distinguishing it from other possible 
conditions. Technical diagnostics as a domain of knowledge started develop- 
ing 30 years ago. At the beginning, the objects of diagnostic interests were 
only mechanical machinery and devices. This set has been successively com- 
pleted with electric devices, electronic systems, complex technological devices 
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and recently with manufacturing and chemical processes as well as control 
systems. However, research methods were developed separately, within two 
groups: 

• the group of specialists working on machinery; this group was pre- 
dominated by experts in mechanics as well as machinery construction and 
operation, 

• the group of specialists working on technological processes and their 
control, with the main position occupied by experts in control engineering. 

It is necessary to stress that lately there have been made effective at- 
tempts in order to join the experiences of the above groups. We have learnt 
to observe the operation of numerous objects, estimate their technical state 
and carry out complex experiments. On the grounds of the comparison of 
the performed research, one may conclude that the methods we apply find 
wide recognition. The basic bibliography on diagnostic experiments has been 
formed. There have been published a dozen or so monographs on the tech- 
nical diagnostics of various types of diagnostic objects. They deal, for ex- 
ample, with the diagnostics of computer and digital systems, the diagnostics 
of machinery and mechanical vehicles and the diagnostics of manufactur- 
ing processes. There have also appeared monographs dealing with selected 
elements of the general theory of technical diagnostics. Moreover, one may 
enumerate a few survey papers. Several conferences devoted to technical di- 
agnostics are organized annually. In 1999 the Polish Society of Technical 
Diagnostics was founded. Technical diagnostics has already become an inde- 
pendent subject in the curricula of technical studies. Many people graduated 
in diagnostics on the basis of a favourable opinion of their activity in that 
domain. Nowadays, a large group of experts with significant experience are 
employed in industry. Numerous methods and terminology peculiar to them 
were built on the grounds of various domains of applications. However, there 
is a lack of one, commonly approved theory of technical diagnostics. Cur- 
rent English terminology has not been established either. This monograph 
deals with the broadly understood diagnostics of processes, which includes 
problems of the technical diagnostics of machinery as well as the diagnostics 
of industrial processes. Within these two domains there have been worked 
out such research methods that are specific for objects being the subjects 
of interests of these domain. Different methods and techniques may be used 
for diagnosing the same technical means to complement each other. Such 
a complementary solution may ensure higher reliability and credibility of 
diagnosing. Thus it seems to be purposeful to present these problems to- 
gether in the monograph. Machinery diagnostics deals with the estimation 
of the technical state of mechanical devices by a direct examination of their 
properties and an indirect examination of side-effects accompanying their op- 
eration, called residual processes (Cempel, 1982; Natke and Cempel, 1997). 
The nature of these processes may be mechanical, electrical or thermal, and 
so on. Among them, a special role is played by vibroacoustical processes 
(vibration and noise) that result directly from the operation of each machine. 
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There is a reason why they are commonly used in an indirect estimation of 
the technical state of mechanical objects (Callacot, 1979; Mitchell, 1981). 
This field of technical diagnostics is called vibroacoustical diagnostics. One 
can enumerate different kinds of inner and outer factors influencing machin- 
ery and devices. They are usually reasons of irreversible processes that can 
cause accumulative changes of the technical state of objects and a gradual de- 
terioration of maintenance characteristics. Such processes are continuous by 
nature. According to machinery use and wear, different types of degradation 
are essential. The processes of changes of the technical state of mechanical 
objects significantly depend on the conditions of their maintenance, services, 
control and repair. The assumed continuity of the change processes is con- 
sidered to be specific to technical diagnostics. The diagnostics of industrial 
processes deals with the recognition of state changes of such processes which 
are considered to be a series of purposeful actions realized in a given time pe- 
riod by a given set of machinery and devices with a given set of resources. As 
reasons of these state changes one considers faults or any other destructive 
events. The task of diagnosing industrial processes is to detect faults that 
occur at an early stage, and to recognize (differentiate between) them. The 
destructive event, such as wear, is considered as a kind of damage that may be 
characterized by different degrees (values) of intensity. It should be detected 
and recognized after a given value is exceeded. Diagnostic tasks formulated 
in such a way extend sigpificantly the problems that are the main tasks of 
control engineering. In this case, we only consider two states: operating and 
damaged ones. According to the diagnostics of industrial processes, we apply 
methods of modelling and identification employing the techniques of artificial 
intelligence. The first paper published on the diagnostics of industrial pro- 
cesses was the monograph by Himmelbau (1978), which dealt with chemical 
processes. One may also mention the papers by Pau (1981), and Patton et 
al. (1989). In the course of the last years there have also been published the 
monographs by Gertler (1998), Mangoubi (1998), Chen and Patton (1999), 
and Patton et al. (2000). 

The goal of the monograph is to present in a coherent way the various 
research methods which have arisen on significantly different assumptions and 
limitations related to such domains as control engineering, artificial intelli- 
gence, machinery and device design and, finally, the diagnostics of industrial 
processes. It is the first attempt to the holistic consideration of problems con- 
cerning faults and wear. Methods and techniques that require a knowledge of 
the models of diagnosed objects as well as these that let us examine the ob- 
jects without the knowledge are described together. The presented methods 
are useful in diagnosing processes that are performed by various technical 
objects such as entire technological systems as well as machinery and devices 
used in such branches of industry as the chemical, petrochemical, power, met- 
allurgical, pharmaceutical, food, paper industries and so on. These methods 
may be also used for diagnosing machinery and devices that are considered 
to be objects taking part in the described processes. 
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The main activity of the authors of this monograph is carried out in 
Poland. It should be pointed out that the development of technical diagnostics 
in this country was supported by a significant contribution made by such 
Professors as Stefan Ziemba, Ludwik Muller, Zbigniew Engel and Czeslaw 
Cempel. 

1.2. Main concepts 

The rapid and multi-directional development of diagnostic methods as well as 
the multiplicity of their applications is the reason why the Polish terminology 
is not quite uniform and unambiguous. Similarly as in the world bibliogra- 
phy, some terms are either used interchangeably or they describe problems 
and phenomena that are different (or not quite the same). In order to avoid 
misunderstandings, selected concepts used in this chapter are defined below. 

Object (machine, device, process) state identification, which is performed 
on the basis of current information about the object, can be considered as 
the following operations: 

• making a diagnosis, whose goal is to determine the current object state, 

• establishing a genesis, whose goal is to determine previous (past) object 
states, 

• giving a prognosis, which consists in predicting further states of the 
object. 

The effects of those operations are called respectively object diagnosis, 
genesis and prognosis. It should be stressed that in some cases the identifica- 
tion of the state can be simplified and limited only to the determination of a 
state change. 

As regards various branches of technology, different ways of state deter- 
mination can be enumerated. According to control theory, an object (device) 
state is understood to be the smallest set containing quantities (variables) 
determined at selected time moments. The knowledge of this set along with 
information about the future time series of input data makes it possible to 
determine future data series of output. The variables of an object (system) 
that characterize its state are called state variables. 

Taking into account the branch of technology connected with the main- 
tenance of machinery, an object state (actually, a model state) is described 
as a set of temporary values that determine the object. The values are called 
parameters. Authors of most papers dealing with this subject matter assume 
that the set of these parameters is relevant. It means that such a set con- 
tains only such elements which are important in the given problem case (and 
clearly appear within a mathematical model of machinery) . The parameters 
that either determine fault-free operation or meet specially defined norms 
are often exclusively analysed. It leads to studies related to different types 
of states such as maintenance, functional or reliability ones. Because of that, 
the state definition is assumed to be dependent on the goal of the technical 
problem which is to be solved. 
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Assuming that state variables are used as a universal state represen- 
tation, one may consider the identification of a state class (instead of the 
identification of parameter values) as the main task. It should be stressed 
that state identification that corresponds exactly to a state class is usually 
enough. As regards numerous practical applications one may notice that there 
is no need to determine the exact values of state variables. It means that a 
state quantity is going to be considered as a single point within the space of 
state patterns. The degrees of the membership of the analysed state to the 
defined state classes are determined by means of coordinates of the space. 

It is assumed in the monograph that the state of a complex object is con- 
sidered to be a set of states of this object’s elements. The reason for a state 
change is often the appearance of faults and different events that infiuence 
both the change of the quality of object operation and return to the normal 
state (fault-free). A formal relationship between faults and object states is de- 
fined in Section 1.4. One assumes that a fault is each maintenance event that 
causes (either directly or indirectly) the occurrence of such a risk of a decrease 
in the operation quality of the object (or the object’s element) which should 
be detected by means of a diagnostic process. Apart from the common faults 
that are often considered, one can also enumerate such events as: a power 
loss, a lack of raw materials at the input of technological apparatus, a wrong 
position of a hand valve switched by an operator, the appearance of parasitic 
reactions within a chemical reactor, excessive wear of the treat of car tyres, 
etc. Events of these kinds are called faults. This concept is used in bibliogra- 
phy dealing with the diagnostics of industrial processes (Isermann and Balle, 
1996). Particularly dangerous faults are called breakdowns (or failures). A 
general object being diagnosed can be either machinery, technological pro- 
cess or automatic system. Assuming that a general object is considered, it is 
required to define a suitably general model of this object in order to carry 
out its further examination. 

At the beginning, two significantly different kinds of object models 
should be considered: 

• individual models, which describe only one selected object, 

• a group (population) of models which describe a set of similar objects 
i.e., models that may be interpreted as individual models which are common 
to all objects belonging to the set of objects considered. 

There are different classes of models that describe the examined ob- 
jects, diagnostic signals or inference processes related to diagnostic research. 
Objects based on structural models (Cempel, 1982), which map mutual in- 
teractions between object elements, are often applied. They make it possible 
to effectively infer a kind of physical quantities, whose changes should be 
observed as signals while the experiment is being conducted. The signals are 
dependent on the object state and the kind of signal features that are char- 
acterized by the largest sensitivity to changes of object states. As the result 
of the complexity of objects, numerous kinds of simplifications are required 
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to identify models. A direct effect of such simplifications can be a disagree- 
ment between detailed inferences that are results of a diagnostic experiment 
and inferences concerning diagnostic rules which are relationships between 
diagnostic signal features and features of the object state that are identified 
on the basis of an analysis of structural models. 

Taking as a basis the general concepts of systems theory (Bertalanffy, 
1968; Klir, 1969; Mesarovic and Takahara, 1975), it becomes possible to de- 
scribe the analysed objects by means of their inputs, outputs and states. 
General systems theory lets us consider both material systems, consisting of 
real operating objects, and abstract systems, which are models of different 
complex objects. An important property of all concepts concerning systems 
theory is the fact that a detailed list of all object elements is not sufficient 
to completely describe the system because relationships between system ele- 
ments (including the object environment) also need to be taken into account. 
A particular advantage of concepts derived from systems theory is that dif- 
ferent material systems can be considered by means of common abstract 
systems, which may be treated as their models. It enables us to make some 
interesting generalizations and analogies. Systems can be considered to be 
models of consecutive stages of an object’s life or of the process of its exis- 
tence (e.g., a model of the process of object recycling) as well as models of 
objects that appear at these stages. 

Deliberations concerning the diagnostics of machinery and technologi- 
cal processes require a definition of the object environment. That definition 
makes it possible to consider the examined object as a system separated from 
the environment. An advantage of this approach is the possibility to observe 
interactions between the environment and the examined objects. The ob- 
servation is performed by means of signals, which are any physical processes 
that carry information. In order to acquire the information, selected values of 
signal features (e.g., rms within a determined frequency band) are estimated. 
Giving up a large number of the applied concept sets, one may assume that 
the estimated values of signal features are considered as process variables. 
Therefore variables can be values of directly measured signal features, values 
calculated on the basis of other quantities being effects of measurements, or 
values estimated by an automatic system as values of control signals. Such a 
definition of process variables also includes very important information about 
the conditions of the object operation. 

The following chain of mappings may be considered as a definition of the 
term diagnostic signal: 

(an operating object) and (the environment of the object) — > 

(interactions between the object and the environment) — > 

(signals) — > 

(process variables) = (the values of selected signal features) — > 
(diagnostic signals). 




1. Introduction 



9 



although it may be simplified: 

(an operating object) and (the environment of the object) — > 

(interactions between the object and the environment) — > 

(diagnostic signals). 

A diagnostic signal is a process carrying information about the state of the 
diagnosed object. Diagnosing (state recognising) is considered to be the pro- 
cess of the detection and distinction of object faults. A diagnosis is an effect of 
recording, processing, analysing and estimating diagnostic signals. The diag- 
nosis can be performed with different degrees of accuracy. Depending on the 
kind of the object and the amount of information about the object, the iden- 
tification of faults or a general determination of state classes as a diagnosis 
is performed. 

Diagnosing, whose goal is to estimate a state, can be performed in the 
form of checking procedures. It is particularly recommended in the case of 
a two-class state space (usable/unusable). An alternative approach is the 
procedure that makes it possible to recognize a state when a large number of 
state classes are considered. 

There are two phases of state examination as distinguished by Rozwa- 
dowski (1983): checking the object’s ability and fault isolation. The first phase 
- bility checking - consists in recognizing the state of an object being consid- 
ered as a determined entity. The simplest diagnosis is to identify whether the 
object is able to operate or it is non-operational. The state of its elements 
is not recognized. The second phase - fault isolation - consists in recognis- 
ing the states of object element states. In this case, the goal is to identify 
elements and sub-assemblies which are non-operational and need to be con- 
trolled, repaired or changed. Taking into consideration the same classes of 
objects, distinguishing between these two phases is often difficult or even 
impossible. It is particularly characteristic in the case when the exhaustive 
procedure of quality checking is not defined or the whole procedure is not 
applied for economical reasons. 

Moreover, as many as three different phases of state examination are 
defined by Isermann and Balle (1996). These are the detection, isolation and 
identification of faults. One can characterise them as follows: 

• fault detection is the identification of fault appearance and the deter- 
mination of the moment of its detection; 

• fault isolation consists in determining the kind, place and time of the 
appearance of the fault; it follows the detection of that fault; 

• fault identification is the determination of the fault size and its change- 
ability in time; it follows the isolation of that fault. 

Moreover, one can give an example of a definition of fault diagnostics 
(diagnosis), which is considered to be an action including both fault isolation 
and its identification. The goal in this case is to determine the kind, size, place 
and time moment of fault appearance. This definition does not correspond 
to the previous one, which included all phases of state examination. To meet 
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the assumption of the paper it is established that fault diagnostics includes 
the detection, isolation and identification of faults. 

Furthermore, a group of concepts that are accepted by the SAFEPRO- 
CESS Committee (Isermann and Balle, 1996) includes: 

• monitoring, which is a task whose goal is to collect and transform 
process variables as well as recognize incorrect behaviour (alarm signals); 
this task is performed in the real time, 

• supervision, which consists in object monitoring and making decisions 
which help to ensure the correct operation of the object in the case a fault, 

• protection, which includes all operations and technical means that ei- 
ther eliminate a potentially dangerous course of a process or prevent the 
results of that course. 

In this monograph, the distinction between monitoring a process flow 
(the operation of an object) and monitoring a process state (an object state) 
is made. A specific example of a process is the maintenance of technical means 
(a machine or device). In this case, process monitoring is equivalent to object 
operation monitoring, which was defined above. The task of monitoring is 
carried out by means of the SCADA {Supervisory Control and Data Acquisi- 
tion) or DCS {Distributed Control Systems) systems. They make it possible 
to process and archive the variables of processes and alarm signals, and to 
show the course of the process. 

Monitoring an object state will be understood as a task whose goal is to 
diagnose an object (process), signalize and, additionally, show, in a diagram 
form, a state or its changes (identified faults). The task should be carried out 
in the real time. Therefore, systems monitoring an object state are real time 
diagnostic systems. Apart from tasks that are real time ones, these systems 
can aid object maintenance in the field of the protection of the object. Such 
a definition of monitoring corresponds to the concept of supervision, which is 
also used. Supervision is defined as either continuous diagnostics or a discrete 
(or also continuous) observation of an object state. 

Testing is the next term connected with diagnostics. This operation is 
understood to be a determined set of diflFerent tests that are conducted in or- 
der to identify whether the values of useful properties of the analysed object 
are within the range of determined parameters. Special kinds of these opera- 
tions are diagnostic tests, carried out in order to check whether the discussed 
criteria are met by given standards and recommendations. 

A diagnosis that can be conducted automatically is also considered. In 
this case, the diagnostic test is understood as a set of operations performed 
(by software) taking into account the values of process variables. The goal of 
that operation is to check the correctness of the operation of a determined 
part of the object. As a result of these operations, a diagnostic signal con- 
taining information about the effect of the check-up is generated. A negative 
result is a symptom that is an evidence of an incorrect state, e.g., fault occur- 
rence. Therefore, one can understand as a symptom the appearance of such 
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a value of the diagnostic signal that corresponds to an incorrect state of a 
part of the object being diagnosed. This event is usually signalized by visual 
or sound alarms. 



1.3. Aims of process diagnostics 

In the course of the last dozen years, a significant growth of the sizes and 
complexity of technological installations in the power, chemical, metallur- 
gical and food industries have been observed. This is a result of attempts 
to minimise the unit cost. A side-effect of this growth is an increase in the 
concentration of measuring, executing and controlling devices as well as the 
growth of the degree of the complexity of automation sets that control these 
processes. 

In spite of the high reliability of elements used in such systems, faults of 
technological installation components and automatic devices as well as fail- 
ures of operator service always occur. According to some specialists, break- 
down states, which commonly occur during object maintenance, are natural. 
They cause significant and long-term disruptions in the course of the manufac- 
turing process, which decrease its productivity and sometimes can lead to its 
termination. In these cases, economic losses are very large. Some breakdown 
states can lead to a hazard to the environment or damage of a manufacturing 
installation. They can be also dangerous to human life (Fig. 1.1). 
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Fig. 1.1. Causes and effects of breakdown states 
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Breakdown state identification is commonly connected with the appear- 
ance of different alarms within a short time period. They are results of the 
assumption that the state of a set is dependent on the states of its elements. 
They are also effects of processes of state propagation (Psiuk, 2001). Prom an 
operator’s point of view, the interpretation of these states may be difficult. 
In such situations there often occurs information overload. Its side-effect is 
stress. It may lead to additional failures of the operating service, which, accu- 
mulating with faults that appeared previously, can cause serious breakdowns. 

Problems of the diagnostics and protection of processes are still being 
developed. The importance of this issue increases along with the growth of 
the degree of automation and increases the numer of members of the object 
servicing staff. It is obvious that an increase in the installation size causes 
the growth of alarm appearance. In cases of typical power or chemical instal- 
lations a few or several thousand alarms are signalized. Therefore, computer 
systems, which enable us either to aid operators in performing a diagnosis or 
to establish a rational diagnosis in an automatic way, are essential. 

In comparison to diagnostic operations carried out by an operator, the 
automation of diagnostic operations makes it possible to significantly shorten 
the time of the identification and isolation of breakdowns. It considerably 
improves the set of reliability parameters of the system and causes an increase 
in economic effects. 

At the same time, diagnostic information that is exact and provided just 
in time makes it possible to determine (in an automatic way or by staff) 
appropriate protecting operations that let us either avoid or limit fault results 
before after-effects that are dangerous for the entire manufacturing process 
course appear. According to that, the effect of the tolerance of some faults is 
obtained (resistance to faults). 

Another goal of systematically carried out diagnostics is to decrease the 
costs of repairs. Check-ups of technological apparatus, control and measure- 
ment devices are usually performed periodically. A process is stopped and 
the devices are tested. That is carried out independently of their technical 
state. Some devices are disassenbled and checked up with the use of a service 
stand. In the case of periodically performed services, opening the housing of 
a machine is often required, although in most cases it is unnecessary because 
the object’s state is good. The harmfulness of these procedures may be com- 
pared with the situation of a patient that is periodically operated on by a 
surgeon in order to check if he or she is healthy. The introduction of remote 
and automatically carried out diagnostics as well as service procedures and 
repairs dependent on the technical state of the object enable us to decrease 
maintenance costs in comparison to periodically conducted diagnostics. As an 
example one can give the control valve. In this case, according to data given 
by Fisher-Rosemount company, the decrease in costs was about 60-70%. 
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1.4. General description of the diagnosed object 

The first step of the identification of a mathematical description of the diag- 
nosed object (or model) should be the determination of its phenomenological 
model. According to the remarks listed above, one should stress the advan- 
tages of models that are based on general systems theory. Let us consider 
a system understood as a model of a real diagnosed technical object. The 
analysis of the real object will be indirectly performed as the analysis of 
an abstract object, which the system is. The term object will be interpreted 
depending on the context as either a real diagnosed object and an abstract 
object, called a system. 

Systems of technical objects can be divided into static and dynamic ones. 
As for static systems, one should explain that their response to an input 
change is immediate. The response to an input change of dynamic objects 
depends not only on their present values but also on the history of changes of 
their values. Multi-dimensional static objects are described by means of sets 
of algebraic equations, whereas physical phenomena taking place in dynamic 
objects are usually described with the use of differential equations which are 
often nonlinear (Cannon, 1967). 

A set of those differential equations can be expressed in the form of state 
equations. Equations of a dynamic object (or system) include the following 
kinds of courses (de Larminat and Thomas, 1983): 

• Input u (multi-dimensional input u), which represents influences of 
the environment on the object: 

r 

u{t) - ui{t) U 2 {t) ••• Up{t) . (1.1) 

These courses are called object forces and excitations. In this group one can 
distinguish control signals, uc^ and different excitations , i/m, whose values 
are known. The remaining outside influences (their values are unknown) are 
considered to be disturbances. 

• Output y (multi-dimensional output t/), which represents the object’s 
influences on its environment. These influences can be considered as responses 
of the object: 

y{t)= yi{t) y 2 (t) ... yq(t) ] ■ (1.2) 

• State variables x (multi-dimensional state coordinate x), which in- 
fluence the manner of transforming inputs into outputs: 

r 

= [ Xi{t) X2{t) ... Xn{t) ■ (1.3) 

The casual state x{t) determined at a given time moment t is dependent 
on both the state x{to) determined at the initial time moment to and the 
history of u{to,t) within a time period (^o,^)- Therefore, the state vector 
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represents a sum of system responses and information about the past action 
that is required for a present state change to be determined. The state vector 
should include the smallest number of variables that is sufficient to describe 
the system at each time moment. Since a single system can be described by 
several different vectors including state variables, the selection of the analysed 
variables is not unambiguous. 

The continuous dynamic object is determined by state and output 
equations: 



x{t) = <f)[x{t),u{t)], 


(1.4) 


II 


(1.5) 



In these equations the influence of disturbances and faults was omitted. 

One can distinguish the following kinds of systems: 

a) Linear and non-linear systems. In the case of linear systems the su- 
perposition principle of inputs and outputs is met. The remaining systems 
are non-linear ones. 

b) Determined and stochastic systems. A system is determined if the 
transformations (1.4) and (1.5) are unambiguous. When these transforma- 
tions have a random character, the system is stochastic. 

c) Stochastic systems can be stationary and non-stationary. In order to 
simplify this problem, the stationary system may be understood as a system 
characterized by time-constant values of estimated statistics of its parame- 
ters. 



A system is an object model in which some simplifications are assumed. 
They consist in omitting the inputs of unknown values. The existence of these 
inputs can be taken into account by introducing some additional inputs called 
disturbances: 

• Disturbances d (e.g., unknown changes of the surrounding tempera- 
ture, unknown fluctuations of power, etc.), which are interpreted as additional 
noise inputs. 

In the case of the numerous object class, it is also possible to make 
the assumption that fault occurrence can be interpreted as an additional 
environmental influence on the object. That enables us to represent these 
faults by means of additional system inputs, which will be called faults. 

• Faults f are considered to be an additional group of inputs that repre- 
sent interactions influencing a change of the quality of the object operation: 

/W=[/i(i) f2{t) ... fk{t)Y. (1.6) 

The values of these inputs can change either by leaps (suddenly) or gradually. 
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A complete description of the dynamic system, taking into account the 
influences of noise and fault inputs, can be represented by the following 
equation: 

x{t) = (f>[x{t),u{t),-d{t),f{t)], (1.7) 

y{t) =ip[x{t),u{t),d{t),f{t)]. (1.8) 

A system diagram which takes into account the influence of noise and fault 
inputs is shown in Fig. 1.2. 
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Fig. 1.2. Diagram of a system which is the diagnosed object’s model. 



Example. Let us consider a mathematical model of a process that occurs in 
a set consisting of three chained tanks, shown in Fig. 1.3 (Koscielny, 2001). 
The stream of liquid that flows into the first tank is forced with the use of a 
pump and chocked by means of the control valve. The outflow from the third 
tank is not controlled. Let us make the assumption that the level of the liquid 
L 3 in the third tank should he stabilized by the control set to maintain a fixed, 
determined value of this level. The position of the control set is controlled by 
an output signal U generated by the regulator. It was assumed that the process 
of level changes in the tanks varies slowly and the liquid is incompressible. 
The measured signals are: the stream of the medium F that flows into the 
first tank and the levels in the tanks L\, L 2 , I /3 . The control signal U 
generated by the control set is also known. 




Fig. 1.3. Diagram of a set of three tanks {U - the control signal, Li, L 2 , Ls - 
the levels of fluid in tanks, F - a stream of liquid flowing into the tanks) 
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Let US now assume that the liquid flowing through is a toxic medium and 
its flow out of the tank is very dangerous. It should be stressed that faults of 
the measured signals used as control ones in the regulator set as well as faults 
of control set devices (e.g., the servo-motor, the control valve, the pump) can 
have dramatic results. As examples one can give the overflowing of the tank 
Zi, a spill out or shortage of liquid in the tank Z3. Results of faults of other 
signals are not as dangerous, provided that they are not used in blockade and 
protection sets. Because all measured signals are usually used for calculating 
technical and technological balance indexes etc., faults of measuring traces 
can cause a deterioration in the quality of an automatic system ’s operation. 

The set of tanks considered in the fault-free state can be given by the 
following physical equations: 



F - ky{S)J ^ = h[S{U)]J ^ APz,<^ « const, (1.9) 



A,^=F-Qi2=^F- ~ ^ 2 ), 

at 



^2-^ — Q12 — Q23 — o^l2Sl2^/2g{Ll — L2) — 0:23823 V‘^g(^2 — Ls), ( 1 . 11 ) 



^ 3 “^ — Q23 — Q3 — 0 ^ 23 * 523 ^ 2^(^2 — Ls) — asSsy/2gL3, ( 1 - 12 ) 



where S is a section area of the control valve, which is open, ky is a flow 
coefficient of the control valve, APz denotes pressure differences observed at 
the control valve, ^ is medium density, Q12, Q23, Q3 are streams of liquid 
flowing between the tanks and flowing out of the third tank, Ai, A2, As are 
section areas of the tanks and a±2, ^23, 03 are flow coefficients, while S12, 
S23, S3 are section areas of the flow canal in the manual valves. 

The first equation describes a flow through the control valve. The remain- 
ing equations (1.10)-(1.12) determine the balance of flows within the partic- 
ular thanks. They result from equations given by Bernoulli. In this case, a 
system input is a control signal u — U and the signals y\ — F, y2 = Li, 
ys — L2, y^ — L3 are outputs. 

The system of equations can be transformed into state equations. Let us 
assume that state variables are levels related to individual tanks: x\ — L\, 
X2 — L2, X3 — L3. The system of equations and output equations is as 
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follows: 

Xi =: -^F - -^ai 2 Si 2 \/ 2 g{xi - X2) 

Ai Ai 

= ^ [5(w)] - j^ai25i2 V^g(xi-X2), 

±2 = -^ai2Si2\/2g(xi - X2) - ^a 2 ^S 2 z\/ 2 g{x 2 - X3), 

A.2 ^2 

^3 = -T~OL2^S2^^/2g{x2 ~ Xs) — -T-otsSs y/2gx^, 

-^3 ^3 

y^ = K[S{U)]^m^^^U), 

V 2 = = Xl, 

yz = X2, 

2/4 = X 3 . 



(1.13) 

(1.14) 

(1.15) 

(1.16) 

(1.17) 

(1.18) 
(1.19) 



The influence of faults on the output can he taken into account by means 
of physical equations. For example, equations corresponding to three tanks 
determined taking into account the influence of faults can be defined as 
follows: 

F + AF = k^[S{U + AU) + ^ (1.20) 



, d(ii + Aii) 





= {F + AF) 



— <^i2(5'i2 + A5i2)\/2^[(Z/i + ALi) — (L 2 + AI/ 2 )] — Qi, 



( 1 . 21 ) 



A 2 — — ^ — cti2(*S'i2 + ASl2)^/2g[{Ll + ALi) — (L 2 + AL 2 )] 

— 0(23 {S23 + AS 2 s)V 2^[(^2 + AL2) — (I/3 + ALs)] — Q2 , (1-22) 



A3— — ^ — 0(23 (S'23 + A523)'\/2^[(L2 H- AL2) — (I/3 + AL3)] 

— <^3(»53 + AS 3 )\/ 2 g{L 3 + AL 3 ) — Q 3 , (1.23) 

where fi are faults of the measuring channel of a liquid stream at the tank 
inflow Zi, observed as a measurement error AFi, f2 denotes the faults of the 
measuring channel of the level in the tank Z\ observed as a measurement error 
ALi, fs denotes faults of the measuring channel of a level in the tank Z2 
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observed as a measurement error AL2, fi denotes faults of the measuring 
channel in the tank Z3 observed as a measurement error AL3, f^ refers to 
faults of the control signal observed as errors AU , fe denotes faults of the 
control set consisting in a change of nominal characteristics S(U) that causes 
a change of the section area of the open valve AS, /r refers to faults of the 
pump that cause a change of pressure before the valve APp, fs is the lack of 
the medium before the pump that causes a change of pressure before the valve 
APp, /g denotes the clogging of the canal between the tanks Zi and Z2 that 
causes a change of the section area A5i2, /lo denotes clogging between the 
tanks Z2 and Z3 that causes a change of the section area AS23, fii denotes 
the clogging of the flow out of the canal from the tank Z3 that causes a change 
of the section area AS3, /12 is a spill out of the tank Zi - Qi, /13 is a spill 
out of the tank Z2 - Q2 O'f^d /14 is a spill out of the tank Z3 - Q3 . 

Therefore, the system (1.20) - (1.23) can be represented by the following 
equations: 



F + h = K[S{U + h) + U] 



^Pz + fj + fs 



(1.24) 



= (F+h) 



dt 



— oti 2 {Si 2 + f 9 )^/ 2 g[{Ll + / 2 ) — (1/2 + /s)] — / 12 , (1.25) 



— Otl2{Sl2 + fg)y/2g[{Li + /2) — (L2 + fs)] 



— (^2s{S23 + flo)^/ 2 g[{L 2 4- /s) — {L 3 + /4)] — /i3, (1.26) 



^3 ^ ^ ^ ~ ^23{S23 + /lo)\/2p[(I/2 + /s) — {L 3 + f^)] 

— o^3{S3 + fll)^y2g(Ls + f^) — / 14 . ( 1 - 27 ) 

As a result of these equations, one can obtain a mathematical model of the 
analysed object that takes into account the influence of faults. The equa- 
tions (1.24)-(T27) can be also transformed into the output equations (1.7) 
and (1.8), which consider faults. It should be stressed that in the general 
case, individual faults are time functions fi — fi{t). Faults can appear either 
suddenly or gradually. It concerns both faults of measurement lines, control 
devices and faults of technological installations (spill out, clog, etc.). Mod- 
elling fault influences on object outputs is usually very difficult and labori- 
ous, and in the case of a complex technological installation even impossible to 
carry out. 
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The main aim of diagnostics is to identify states of technical objects. An 
object state is a result of changes caused by broadly understood faults that 
appear slowly or suddenly and are effects of the ageing and wear of objects 
or an incorrect adjustment of their subassemblies, which causes changes of 
the quality of the object’s operation. The object state can be differently un- 
derstood in comparison to the system state (1.3) included in (1.4) and (1.5). 
Determining the transformation of a system state into an object state or 
defining the interpretation of a system state requires correct knowledge of a 
branch dealing with the object being diagnosed. One can make the assump- 
tion that the simplification of all reasons that cause a change of the technical 
state of an object will be considered as faults. For example, the erosion phe- 
nomenon of a fungus or valve-seat occurring as slow changes related to wear 
processes are interpreted as faults of a control set. However, the assumption 
about faults as a representation of wear requires making a particularly correct 
distinction between state variables. In this case, one can distinguish variables 
that play the role of a system memory and variables which exemplarily repre- 
sent the results of destructive phenomena taking place within an object. An 
alternative solution (which is not applied here) is to distinguish additional 
inputs that represent wear. 

According to the above assumptions, the technical state of the object 
z{t) is a fault function defined as follows: 

z{t) = z[f{t)]. (1.28) 

If one could determine a fault vector / , which is based on equations of 
both the state (1.7) and the output (1.8) of the object (or a set of physical 
equations that describe the object) with a simultaneous lack of disturbances 
(noise inputs): 

fit) = [yit),xit),u{t)] , d = 0, (1.29) 

a diagnostic problem would be solved. 

Such a problem is usually very difficult or even impossible to solve. Even 
though the equations (1.7) and (1.8) are known, the identification of inverse 
models taking the form (1.29) is not always possible. These equations often 
have a complicated form and in reality the number of faults is always higher 
than the number of equations that describe the object. For instance, the four 
equations (1.24)-(1.27) describing the object include 14 faults, therefore the 
set does not have a unique solution. 



1.5. Basic concepts of process diagnostics 

Taking into account the diagnostics of processes, one can enumerate the fol- 
lowing processes as subjects of diagnostic investigations: 

• the technological process, 

• the process of the maintenance of machines and devices. 
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The meaning of the term technological process state is dependent on the 
kind of this process. To generalize, one can assume that the technological 
process state is a set of deviations of the variables of this process from a 
pattern set. However, the state of the process of the maintenance of machines 
and devices is usually defined as a sum of states of elements of a technological 
installation with measurement and control devices. It means that the diag- 
nostics of the maintenance process is connected with the consideration of a 
fault set that includes faults of the elements of a technological installation, 
faults of measuring channels and faults of control devices (Fig. 1.4). 



Actuator Component Sensor 

faults faults faults 




(Unknown inputs) 

Parameters variations, disturbances, noises 



Fig. 1.4. Diagram of the diagnosed system 

Considering the faults of selected measuring channels as faults that only 
infiuence the state of a technological process and do not infiuence the machine 
or device is justified in some cases. 

According to a comprehensive approach to the diagnostics of an object, 
a set of the analysed destructive phenomena that are interpreted as faults 
should also include undesirable process states (e.g., the appearance of par- 
asitic reactions in a chemical reactor), wear processes, a lack of power or a 
lack of raw materials at the inputs of technological apparatus, etc. It is ob- 
vious that all possible faults cannot be considered by each method. Some of 
them (e.g., a bank of observers) were developed in order to consider a partic- 
ular group of faults (e.g., measuring lines, control devices or the elements of 
technological installations) assuming at the same time a lack of other faults. 
Numerous methods omit faults of measuring channels and then signals are 
treated as accurate and fault-free. 

In the general case, there are three phases in the diagnostic process (Iser- 
mann and Balle, 1996): the detection, isolation and identification of faults. In 
practice the identification phase appears rarely and is sometimes connected 
with fault isolation. Thus, in fact in most cases, the diagnostic process in- 
cludes usually only two phases: fault detection and isolation. Detection con- 
sists in identifying fault symptoms on the basis of the results of process vari- 
able transformation (either with or without the use of an object’s models). 
As a result of the isolation phase, the faults which occurred are shown. This 
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visualization is performed with the use of previously identified symptoms. 
In the case of a lack of other symptoms, inference about fault appearance 
requires great caution. A mistake that can be made in this case consists in 
inferring on the basis of false premises. In some cases, instead of the fault iso- 
lation phase, one can distinguish the phase of the identification of an object 
state or a class state. A diagram of inference that includes these two phases 
is shown in Fig. 1.5. 



IF - Faults 



U - Inputs 



OH PROCESS 



Y - Outputs 



LJ 

Fault 

detection 




Fig. 1.5. Diagram of diagnosing with the phases of fault detection, 
and isolation or identification of the process state. 



Fault detection can be performed either with or without the use of an 
object’s model. In the first case, the detection phase includes generating resid- 
uals with the use of models (analytical, neural, rough, fuzzy, etc.) and esti- 
mating residual values. It consists in transforming quantitative diagnostic 
residuals into qualitative ones and making a decision about the identifica- 
tion of symptoms. A diagram of diagnosing with the use of process models is 
shown in Fig. 1.6. 

In the case of a lack of information about the whole model of an object 
or the excessive complexity of this model, methods of control limitations and 
the control of simple relations between process variables are used in order to 
detect faults. This approach is applied when the basis for the estimation of 
an object state is formed by the results of residual processes accompanying 
this object operation. This method of diagnosing is shown in Fig. 1.7. 

A kind of diagnosing directly based on continuous process variables (in- 
put and output signals) may also be applied. In this case, the phases of fault 
detection and isolation are joined (Fig. 1.8). These diagnosing procedures. 
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Fig. 1.6. Diagram of diagnosing using process models 
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Fig. 1.7. Diagram of diagnosing using detection without the process model 
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concerning the classification of faults or object states, are mainly applied with 
the use of neural networks. In this case, it is necessary to acquire knowledge 
related to the transformation of a space X that includes values of process 
variables into a fault set F or an object state Z. The acquisition of the 
knowledge is very difficult and sometimes impossible in the case of dynami- 
cal objects. The reasons for that are the variability of working conditions of 
an object and interactions of control sets. 
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Z - Process state 



Fig. 1.8. Diagram of diagnosing with joined 
phases of fault detection and isolation 



If diagnosing is considered to be the process of pattern recognition, two 
phases are usually distinguished: the symptom extraction phase and fault or 
object states classification (Fig. 1.9). The first phase corresponds partially to 
fault detection, whereas the second one is related to fault isolation. 

Objects’ models can be used in the extraction phase. If appropriate mod- 
els are unknown, diagnostic signals are results of the spectral or statistical 
analysis of process variables. 

In the case of a large number of diagnosed objects, including techno- 
logical processes, their complex numerical simulation models are known. A 
direct application of these models in order to estimate residual is in gen- 
eral impossible. Diagnosing such objects may be performed with the use of 
inverse diagnostic models (Cholewa and White, 1993). Suggestions of effec- 
tive numerical methods of inverting multi-dimensional dynamical models of 
machines and devices evoke an increase in the interest in simulation diag- 
nostics (Cholewa and Kicihski, 1997). The initial phase of research consists 
in determining complex numerical simulation models that allow generating 
signals corresponding to interactions characteristic for complex states of an 
object. The verification of these models is carried out with the use of real 
objects and laboratory stands. Inverse models are determined on the basis of 
the verified models. The inverse models are diagnostic relationships that are 
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Fig. 1.9. Diagnostics as the process of pattern recognition 



looked for. These relationships transform diagnostic symptoms into technical 
state classes. The need for such operations is a result of the fact that a direct 
formulation of diagnostic models as well as the analytical inverting of multi- 
dimensional dynamic models is impossible. A great difficulty of the discussed 
problem is related to the fact that the result of inverting casual simulation 
models does not exist, as opposed to casual relation. 

A diagnostic system should make it possible to distinguish faults and 
object states with determined accuracy. The possibility of making this di- 
vision depends, among other things, on the properties of the object being 
diagnosed. In order to achieve the required distinction, a proper set of de- 
tection algorithms should be developed. It is connected with the necessity 
of equipping the object with a given set of measurement devices. Methods 
of analysing fault (object state) distinction are dependent on the notation 
method used for the relationships between diagnostic signals and faults. 

At the same time, the requirement of a minimal number of the analysed 
diagnostic signals to ensure the correct distinction between faults and object 
states is often imposed. This problem is usually considered as a machine di- 
agnostic problem. In these applications, signal analysis methods are usually 
used for extracting features. Numerous sets can be estimated for each signal. 
Not all of them are informative enough and essential to identify an object 
state. It is often the reason for reducing and selecting information. That 
consist in transforming quantitative features into qualitative ones, selecting 
the most important features and rejecting the remaining ones. Diagnostics 
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with the use of object models rarely faces the problem of residual reduction. 
Examples include installations that contain complex measurement sets. Ob- 
taining additional residuals, which can increase fault distinction, is usually 
difficult. It should be also stressed that an excess of information about the 
object state can be purposeful and may serve, e.g., to increase the credibility 
of diagnostic inference. 



1.6. Summary 

The monograph is a result of the cooperation of numerous teams, which con- 
centrate on broadly understood technical diagnostics. The problems that were 
considered as particularly important are included here. However, it should be 
stressed that some limitations were assumed in order to solve the issues. The 
monograph is not an exhausting description of problems whose solutions are 
known. 

The presented issues require further research. It seems that the main 
directions of the research will be: 

• a further development of inference methods connected with the ap- 
plication of artificial intelligence; these methods enable us to apply generally 
understood expert systems and formalized manners of knowledge acquisition, 
notation, negotiation and generalization; 

• a further development of object modelling methods and their imple- 
mentation in diagnostics, differential diagnostics, diagnostic inverse models, 
belief networks; 

• the development of new methods of signal analysis, including, among 
other things, high order spectra, multidimensional time spaces, concepts of 
local time scales, wavelet transformations; 

• the development of the so-called smart transducers, which take ad- 
vantages of minimization, make it possible to perform signal analysis and 
diagnostic inference processes directly within the transducer; 

• the development of intelligent control devices executing functions of 
auto-diagnostics; 

• the usage of diagnostic methods in sets that tolerate faults (resistant 
or single faults); 

• attempts at the unification of database structures connected with tech- 
nical diagnostics, which allows applying uniform software aiding diagnostic 
research and will help exchange information dealing with research results; 

• including diagnostics in management systems, particularly those that 
aid management processes and production means. 

Another distinct problem is the need to continue works on the standardization 
of the applied terminology. However, in this case a complete consensus should 
not be expected. Instead of the expected improvement it can have a negative 
influence on the range of research and the variety of the applied methods. 
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Chapter 2 



MODELS IN THE DIAGNOSTICS OF 
PROCESSES 



Jan Maciej KOSCIELNY* 



2.1. Introduction 

Due to the existence of various classes of diagnosed systems, different kinds 
of models are used in diagnostics studies which are being developed. A gen- 
eral description of a system that takes the effects of faults into account is 
usually impossible, and even if such a description exists, the dependence that 
characterises particular faults cannot be defined on the grounds of it. There- 
fore, different kinds of simplified models are applied in diagnostics. The most 
important of them are described in the present chapter. Models used for 
fault detection as well as models applied to fault isolation or system state 
recognition are singled out. Among models used for fault detection, the most 
important are analytical, neural, and fuzzy ones. A great variety of models 
are applied to fault isolation or system state recognition. These models de- 
fine the relationship that exists between diagnostic signals (symptoms) and 
faults or system states. Models that map binary, multi-value or continuous 
diagnostic signals into the space of system faults or states are described. 

The presented description of models is not comprehensive, and many kinds 
of models are discussed very briefly. A more complete description of many 
of them has been given in chapters that deal with particular methods of 
diagnosing. Nevertheless, a general presentation of models used in diagnos- 
tics may prove to be useful, especially for readers who are just beginning 
to study technical diagnostics, and it should help to understand particular 
problems. 
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2.2. Relations in diagnostics 

When classifying models applied to the diagnostics of processes (systems), 
it is possible to distinguish models of systems applied to fault detection, 
and models used for fault isolation or system state recognition. Models used 
for fault detection describe relationships existing within the system between 
the input and output signals U Y (usually in a normal state, i.e., without 
faults), and allow detecting changes (symptoms) caused by the faults. Models 
used for fault isolation define the relationship existing between diagnostic 
signals and faults S ^ F. Models that map the space of diagnostic signals 
into the space of system states S ^ Z are necessary for recognising the 
system’s technical states. 

Analytical models as well as fuzzy and neural ones are applied to fault 
detection. These models usually describe the system in the normal state (i.e., 
without faults). They allow calculating residuals refiecting divergences that 
may exist between the observed operation of the system and the normal 
operation defined by the model. The residuals are most often calculated as 
differences between the measured and the modelled output signals. Residual 
values in the state without faults should oscillate around zero. Residual values 
different than zero denote symptoms of faults. 

Residual values are rarely used for fault isolation. The binary or multi- 
value evaluation of residual values is usually carried out, and inference about 
faults is carried out on the basis of diagnostic signals that were converted in 
such a way. A classifier is needed for the conversion of residual continuous 
signals into quality (e.g., binary or multi- value) diagnostic signals R S. 
Diagnostic signal values that testify to the existence of faults are called 
symptoms. 

Models applied to fault isolation (state recognition) should map the space 
of diagnostic signals into the space of faults (system states). The relationships 
can be defined on the basis of analytical modelling, taking into account the 
effect of faults, training, or an expert’s knowledge. 

If hardware redundancy is applied, the relationships result directly from 
the redundant structure. Let us notice that relations S ^ F and 5 => 
Z are inverse models, i.e., result-cause-type relationships. Relations used 
in the process of diagnosing on the basis of system models are presented 
in Fig. 2.1. 

When models of the system are not known, diagnostic signals are cal- 
culated on the basis of the classification of system output signal values 
Y => S or their attributes A(Y) S. Simplified diagrams of diagnostic 
inference that use the relationship existing between process variables (sys- 
tem inputs and outputs) and faults X ^ F or technical states X ^ Z are 
also applied. 
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Fig. 2.1. Diagram of diagnosing on the basis of 
system models with the applied relations marked 



2.3. Models applied to fault detection 

The analytical redundancy of a measurement line exists when an addition- 
al value of a process variable is obtained (calculated) on the grounds of a 
mathematical model that connects the calculated variable with other mea- 
sured signals. Mathematical models are applied to the calculation of process 
variable values instead of the application of redundant measuring devices in 
the system structure. Analytical redundancy is used for fault detection. It 
gives the ground for the generation of residuals as differences between the 
measured and calculated values of signals. 

Figure 2.2 presents a general diagram of residual generation with the use 
of the analytical redundancy of measurements. The analytical model of the di- 
agnosed system contains all relationships existing between process variables. 
All physical redundant relationships existing between process variables can 
be derived from the model. 
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Fig. 2.2. Diagram of residual generation with the 
use of the analytical redundancy of measurements 



It should be stressed that analytical redundancy models directly or indi- 
rectly the normal operation of a certain part of the installation. An incorrect 
result of signal value calculation can therefore be caused both by faults of 
the measurement lines of these process variables on the grounds of which the 
variable analytical value is calculated, and by faults of the modelled elements 
of the installation. Omitting this fact during diagnosing can lead to the in- 
terpretation of installation element faults as faults of sensors, and such errors 
can have far-reaching consequences. 

It is possible to single out the following models belonging to the group of 
analytical models applied to fault detection (Chen and Patton, 1999; Gertler, 
1998; Koscielny, 2001; Patton et al, 1989): 

• physical models (equations of movement, balance equations, etc.), 

• input-output-type linear models (continuous or discrete transmit- 
tances) , 

• state linear equations, 

• state observers and Kalman filters. 

Fuzzy and neural models are universally used for fault detection beside 
analytical models. In this case it is possible to speak about information re- 
dundancy, of which analytical redundancy is a particular case. The fault de- 
tection diagram is similar to the one presented in Fig. 2.2. A short discussion 
of models applied to fault detection is given below. 

2.3.1. Physical equations 

In general, a complete model of a system can be directly obtained from phys- 
ical equations (Cannon, 1967). Non-linear static systems having one input 
and many outputs are described by equations that have the following shape: 

$(2/,m)=0. (2.1) 

This equation describes the system in the state of complete efficiency. The 
appearing faults result in the fact that the above relationship is not fulfilled. 
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A fault symptom is therefore a residual value different than zero. It can be 
calculated as follows: 

r = ^{y,u). (2.2) 

Faults of dynamic systems having one input and one output and defined 
by non-linear differential equations that have the form 

^ {y,y' ,y" , ■ ■ ■ ,u" , ■ ■ ■ =0 (2.3) 

can be detected on the grounds of the evaluation of residual values calculated 
from these equations: 

r = ^ {y,y' ,y" , . . . ,y^'^\u,u' ,u” , . . . ,u{m)) . (2.4) 

A similar model describes dynamical systems that have one input and many 
outputs. 

For instance, the following residuals are generated on the grounds of the 
equations (1.9)-(1.12) by subtracting the right-hand side of a particular equa- 
tion from its left-hand side: 

n=F- $([/), (2.5) 

r 2 = F - ai 2 SnVMLi - L 2 ) - (2.6) 

fs = oii2Si2\/2g{Li — L 2 ) — Q-2zS2z'\/2g{L2 — L 3 ) — (2-7) 

J'4 = OL 2 zS 2 zy/‘^g{L 2 — L 3 ) — azS‘i's/‘ 2 .gL 2 — ^3-^- (2.8) 

Models elaborated on the basis of physical equations describe most com- 
pletely the relationships existing between process variables. Therefore, they 
allow detecting faults of small sizes. The elaboration of models on the grounds 
of physical equations is extremely difficult or even impossible for many sys- 
tems, and parameter identification provides its own, additional difficulties. 
Thus, the application of this method is limited to systems that are described 
by relatively simple equations. 

Linear models of systems are often applied to fault detection. They are 
much simpler than non-linear models. Two basic forms of such models, i.e., 
state equations and transmittances, are discussed in successive subchapters. 

2.3.2. State equations of linear systems 

A dynamic stationary linear system that has p inputs and q outputs can 
be described by the state and output equations with continuous time of the 
form 



x{t) = Ax{t) -1- Bu{t), 
y(t) - Cx{t) + Du{t), 



(2.9) 

( 2 . 10 ) 
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or those with discrete time: 

x{k + 1) = Ax{k) + Bu{k), (2.11) 

y{k) = Cx{k)-\-Du{k), (2.12) 

where A is the system matrix of dimensions {n x n), B is the control 
(input) matrix of dimensions (n x p), C is the response (output) matrix of 
dimensions [qxn)^ and D is the matrix of dimensions {q x p). 

The block diagram of a linear multi-dimensional system is shown in 
Fig. 2.3. System description in the shape of state equations is the basis for 
residual generation on the grounds of the so-called time redundancy (Chow 
and Willsky, 1984). Moreover, state equations are the basis for the design of 
diagnostic observers discussed in the following part of the chapter. 




Fig. 2.3. Block diagram of a linear multi-dimensional system 



2.3.3. State observers 

Let us assume that state equations have the following form: 

x{k + 1) — Ax{k) -h Bu{k), (2.13) 

y{k) = Cx{k). (2.14) 

In (Chen and Patton, 1999; Gertler, 1998), such a shape of state equations 
is used for constructing a diagnostic observer. The observer is an algorithm 
whose application makes it possible to approximate a dynamic system state 
on the basis of the input and output signals. The equations of a complete 
observer have the form 



x{k + 1) = Ax{k) -f- Bu{k) H[y{k) - Cx{k )] , 
y(k) = Cx{k), 



(2.15) 

(2.16) 
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where x is an approximate of a state, y is an approximate of an output, 
and H is the observer feedback matrix. The vector of residuals is therefore 
defined by the following equation: 

r{k) = y{k) — Cx(k). (2.17) 

Faults / and disturbances d are modelled in state equations as follows: 

x{k + 1) = Ax{k) + Bu{k) + Ed{k) + Ff{k), (2.18) 

y{k) - Cx{k) + Ai/, (2.19) 

where E is the disturbance input matrix, F is the fault input matrix of 
system components and actuators, and Ay represents faults of the mea- 
surement lines. The observer takes therefore the following form: 

x{k + 1) = Ax{k) 4- Bu{k) + H[y{k) — Cx{k)] 

-^Ed{k) + Ff{k), (2.20) 

y{k) = Cx{k) -h Ay. (2.21) 

Diagnostic observers are applied to fault detection algorithms, while banks 

of observers allow also isolating faults. 

2.3.4. Transfer functions of linear systems 

Operator transmittance is a common method of describing a dynamic system. 
It is defined as a ratio of the output signal Laplace transform y{s) to the 
input signal Laplace transform u{s)^ assuming that the initial conditions are 
equal to zero: 

G(s) = (2,22) 

For a stationary linear system that has one input and one output and is 
described by the following linear differential equation with constant coeffi- 
cients: 



d^y d^ ^y 

+ • • • + ao2/ 

, d^u , , , , 

= ^’-^+^rn-i-^p^ + --- + box (n>m). 



(2.23) 



Operator transmittance is a rational function of a variable 
polynomials) : 



G{s) 



yjs) 

u(s) 



M{sY 



s (a ratio of two 



(2.24) 
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where 


m 


n 




L{s) = 




M{s) = ^ 

i=0 


(2.25) 


The dynamic properties of a multi-dimensional linear (stationary) system 
that has p inputs and q outputs are defined by the matrix of operator 
transmittances: 




Gii(s) 


Gn{s) ••• Gip(s) 




G{s) = = 

m(s) 


G2i{s) 


G22{s) ••• G2p{s) 


, (2.26) 




. 


Gq2 ( 5 ) • • • Gqp{s) 




where Gij{s) = yi{s)/uj{s) 


, i = 1,2, 


j = 1,2,..., p 


is the operator 



transmittance between the z-th output and the j-th input of the system. 
Any i-th output is defined by the following formula: 

yi{s) = Gi{s)u{s) ^ Gii(5)ui(s) + Gi 2 {s)u 2 {s) H h Gip{s)up{s). (2.27) 

Equations of this form are called equations of consistence. The effect of 
faults can be modelled in these equations as follows: 

yi{s) = Gi{s)u{s) + GFi{s)f{s). (2.28) 

In this case, the matrix of transmittances possesses additional columns, which 
contain transmittances for particular pairs of an output-fault Gpikis) = 
yi{s)/ fk{s). The matrix of transmittances can be calculated on the grounds 
of state equations from the following formula: 

G{s) = C[sI-Ay'^B + D. (2.29) 

Discrete transmittances are used for describing systems with discrete time. 
A discrete transmittance is the ratio of the 2^-transform of the output signal 
y{z) to the .^-transform of the input signal u(z), assuming that the initial 
conditions are equal to zero: 

G(.) = gl. (2.30) 

For a linear system described by a difference equation of the n-th order: 

n m 

Y^aiy{k + i) =Y^bju{k+j), (2.31) 

i=0 j=0 

the formula for the transmittance takes the form 

Q(z) = + ■ ■ ■ + biZ + bp 

u(^z) 0” + dn— + • • • + CliZ + do 



(2.32) 
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Multiplying the numerator and denominator of the transmittance by 2 ; 
it is possible to obtain 






bmZ^-^ + + • - + hiZ^-^ + bpz-^ 

1 + an-iz-'^ H h aiz'^-^ + aoz~^ 



(2.33) 



Similarly as in continuous systems, the dynamic properties of a multi- 
dimensional linear (stationary) system that has p inputs and q outputs are 
defined by the discrete transmittance matrix: 






Gi2{z) 

G22(z) 



Gip(z) 

G2p(z) 



(2.34) 



Gn(z) 

G2i(z) 



Gql(z) Gq2(z) ••• Gqp{z) 



where Gij{z) = yi{z)/uj{z), z = l,2,...,g; j l,2,...,p is the operator 
transmittance between the z-th output and the j-th input of the system. 

The discrete transmittance matrix can be calculated from discrete state 
equations according to the following formula: 

G{z) = C[zI-A]~^B + D. (2.35) 

Residuals that have one of the following shapes can be generated on the 
grounds of system transmitt ances: 



r{s) = y{s) - G{s)u{s), (2.36) 

r(s) = M(s)y(s) — L{s)u{s). (2.37) 

Continuous and discrete transmittances describe the dynamic properties 
of linear systems. In the case of non-linear systems, they model the dynamic 
properties of the system only in the neighbourhood of the point of operation. 



2.3.5. Neural models 

Structures of artificial neural networks that have been intensively developing 
during the last decade can successfully be applied to dynamic system mod- 
elling. It has been noticed that apart from the possibility of training on the 
grounds of measurement data, the important advantages of neural networks 
are, among others, the ability of modelling any non-linearities, high robust- 
ness to disturbances, and the ability of generalising knowledge contained in 
the network. Some basic information about the structures and training algo- 
rithms of artificial neural networks can be found in the textbooks by Hertz 
et al (1991), Haykin (1994), Liu (2001), and Norgaard et al. (2000). 
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An artificial neural network is a set of connected and acting in parallel 
non-linear units called neurons. Input signal multiplication blocks as well as a 
neuron activation block, which usually realises one of the following functions: 
a linear, signum, sigmoid, or hyperbolic tangent one, are the basic elements 
of the neuron model. The activation block generates the output signal y of 
the neuron. 

Multi-layer perceptron-type networks are most widely applied to the mod- 
elling of systems. Such a network contains an input layer, one or more hidden 
layers, and an output layer. Neurons in the input layer realise the initial con- 
version of input data, e.g., range standardisation. The actual conversion is 
realised in hidden layers and in the input layer. Neurons in neighbouring lay- 
ers are connected according to the rule “each one with all of the others”, and 
a weight is attributed to each one of the connections. Knowledge contained 
in the network is represented by its structure and weight values. 

An example of a network structure is shown in Fig. 2.4. 




Fig. 2.4. Perception network structure shown as an example 



Designing a system of artificial neural networks for the implementation of 
a specific task requires defining the network structure, i.e., making a decision 
about the number of layers and the number of neurons in each layer, as well as 
the choice of weight parameter values, and the character and coefficients of the 
activation function. The choice of the structure has been carried out so far by 
the method of trials and errors, since methods of constructing networks that 
possess the required properties are not known. A neural network is usually a 
black box. The particular elements of its structure as well as weights have no 
physical relevance to the system structure and parameters. Calculating the 
weight coefficient value takes place during the process of training. 
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It was shown (Hertz et al, 1995) that continuous functions can be ap- 
proximated with any accuracy in perception networks that have one hidden 
layer and a linear activation function in the output neurons. However, it is 
the designer who decides about the choice of the number of the hidden layers. 
The discontinuous character of the network output changes can be mapped 
well by the application of two hidden layers having a sufficient number of 
neurons in each layer. 

Radial Basis Functions (RBF) are also applied to the modelling of static 
systems for the needs of fault detection. Only the neurons in a hidden layer of 
such a network realise non-linear mapping with the use of the basic function 
that radially changes around a chosen centrum. For instance, the Gaussian 
function is radial. Output neurons are usually linear. 

Neural networks of the GMDH-type (Group Methods of Data Handling) 
are another method of dynamic system modelling (Pham and Xing, 1995; 
Korbicz and Kus, 1999). Such networks use the method of data group pro- 
cessing. The algorithm allows organising the network structure in an evolu- 
tional way. The neuron form, which usually has the shape of the sum of a 
product of a parameter and a neuron input signal function that is linear with 
respect to the parameters, is also different. 

One-directional perception networks realise static mapping. One-direct- 
ional multi-layer perception networks having a delay line in input signal lines 
(Fig. 2.5) are applied in order to take system dynamics into account. Anoth- 




Yi 



Y2 



Fig. 2.5. Perception network with delay lines in input lines 



er solution are recurrent networks, in which there exists feedback between 
the output layer or a hidden layer and the input layer. Dynamic neural net- 
works are based on the dynamic neuron model. In comparison with the static 
model, the neuron dynamic model has been expanded with a filter module. 
The module contains a linear dynamic system that is defined by a discrete 
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transmittance, or a difference equation having the following shape: 

y(k) = -aiy(k - 1 ) anV{k - n) 

-h hox{k) + bix{k - 1) H h bnx{k - n), (2.38) 

where x{k) denotes the filter input, y{k) denotes the filter output, and n 
is the filter order. 

The filter output is the activation block input. The activity of a neuron 
depends therefore not only on input signals but also on its inner state. A 
neural network built from dynamic neurons has the structure of a multi-layer 
one-directional network (Korbicz et a/., 1999). 

2.3.6. Fuzzy models 

Analytical models of systems are often unknown, and knowledge about the 
diagnosed system is inaccurate. It is formulated by specialists and has the 
form of if-then rules containing the linguistic evaluation of process variables 
such as high temperature or low pressure. In such a case, fuzzy models can 
be applied to fault detection. Principles of fuzzy modelling are discussed, for 
example, in (Czogala and Lqski, 2000; Piegat, 2001; Rutkowska, 2002; Yager 
and Filev, 1994). Such models are based on the theory of fuzzy sets, which 
was developed by Zadeh in 1965 by defining a fuzzy set. 

A fuzzy set A in a certain numerical space of discourse X is the set of 
pairs 

A = {{/ia{x),x)}, \/xeX, (2.39) 

where iia{x) is a membership function of the fuzzy set A, while ij^a{x) G 
[0, 1]. The membership function realises the mapping of the numerical space 
X of a variable to the range [0,1], i.e., fiA' X [0,1]. Contrary to the 
classical theory of sets, where an element can either belong to a set or not, 
in the case of fuzzy sets an element x belongs to the fuzzy set with a certain 
degree: 

fiA{x): X^[0,1]. (2.40) 

Fuzzy sets are attributed to each input and output signal during fuzzy 
modelling. For instance, fuzzy sets for a variable called water temperature 
are shown in Fig. 2.6. Linguistic values, e.g., very low temperature (BN), low 
temperature (N), average temperature (S), high temperature (W), and very 
high temperature (BW) are attributed to particular fuzzy sets. Different kinds 
of the membership function are applied. Those most often used are trapezoid 
ones, triangle ones, or functions having the Gaussian curve. 

Knowledge about the system is described in the form of rules that have 
the following form: 



Ri = if {xi = An) and {x2 = A2i) 

and . . . and {xn = Ani) then {y = Bj), 



(2.41) 
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Fig. 2.6. Examples of fuzzy sets applied to 
the rough estimation of water temperature 



where Xk denotes the A:-th input, Aki is the i-th fuzzy set of the k-th input, 
y denotes the output, and Bj is the j-th fuzzy set of the output. 

The set of all of the rules forms the basis of rules. 

A fuzzy model structure (Piegat, 2001) is shown in Fig. 2.7. It contains 
three blocks: the fuzzyfication block, the inference block and the defuzzyfica- 
tion block. Input signal values are introduced to the fuzzyfication block input. 
This block fuzzyfies, i.e., defines the degree of the membership of the input 
signal to particular fuzzy sets. On the grounds of input signal membership 
degrees, the resulting membership function of the output is defined in the 




Fig. 2.7. Fuzzy model structure 
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inference block. The inference mechanism can be realised in many ways by 
means of different operators used in fuzzy logic. They are discussed in detail 
in (Piegat, 2001) and (Yager and Filev, 1994). On the basis of the resulting 
membership function of the output, a precise (crisp) value of the output is 
calculated in the defuzzyfication block. 

The knowledge of an expert, e.g., an engineer or the process operator, 
on the basis of which the rules determining the operation of the system 
are defined, can be used for constructing the model. However, the direct 
approach to model construction has serious disadvantages. If the expert’s 
knowledge is incomplete or faulty, an incorrect model can be obtained. While 
constructing the model, one should also apply data from measurements. Large 
process variable value sets are registered and stored by Distributed Con- 
trol Systems (DCS) or Supervisory Control and Data Acquisition Systems 
(SC AD A). 

It is advisable to join the expert’s knowledge with available measurement 
data while constructing a fuzzy model. The expert’s knowledge is useful for 
defining the structure and initial values of the model parameters (distribution 
of the membership function), and measurement data are useful for model tun- 
ing. Such a conception has been applied to fuzzy neural networks. They are 
convenient modelling tools for the needs of residual generation since they al- 
low joining the fuzzy modelling technique with neural network training meth- 
ods. Fuzzy neural networks have been discussed, for example, in (Horikawa 
et a/., 1999; Fuller, 1995; Jang, 1995; Rutkowska, 2002). 

While constructing a fuzzy neural network, it is possible to make use 
of the expert’s knowledge for defining the number of if-then rules and for 
the initial distribution of the membership functions of particular rules, and 
measurement data can be applied to network training (weight tuning). The 
model obtained with the help of fuzzy neural networks does not constitute a 
black box. It can be easily written in the form of if-then rules and interpreted 
as a fuzzy model. 

Fuzzy Neural Networks (FNN) have a structure that represents the fuzzy 
inference process. Two parts can be singled out in such networks. The first 
parts corresponds to rule premises, i.e., with the part contained between 
the words if and then. It realises the part of the inference mechanism that 
is responsible for the calculation of the rule firing level. The second part 
that is contained after the word then corresponds to fuzzy rule conclusions. 
It realises the calculations of the network output using elaborated premise 
values. 

Fuzzy neural networks that have the simplest structure with outputs hav- 
ing the shape of singletons are defined by the following set of rules: 

Ri'. if x\ is An and X 2 is A 2 i 

and . . . and Xn and Ani then y is Vi, (2.42) 
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where Xj {j — 1,2, ...,n) denotes the j-th input of the model, n is the 
number of inputs, Aji denotes a fuzzy set of a predecessor that has the 
membership function /j.Aji{xj), y denotes the output of the model, and yi 
is the output value (a singleton). 

A neural fuzzy network that has two inputs, nine rules, and uses the 
bell-shaped Gaussian function for the description of the form of membership 
function is presented in Fig. 2.8. Five layers can be singled out in the network 




(A) (B) (C) (D) (E) 



Fig. 2.8. Diagram of a fuzzy neural network that has two inputs and nine rules 



structure. The layers (A) to (D) correspond to the predecessor part of the 
rules and realise the calculations of the rule firing levels. The network output 
is calculated in the layers (D) and (E), which correspond to rule conclusion. 
The layers (A), (B) and (C) of the fuzzy neural network structure are re- 
sponsible for the elaboration of predecessor membership function values. The 
layer (A) of the network has a symbolic shape only and is used for delivering 
network inputs as well as the signal equal to 1 to particular units of the layer 
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(B). The layer (B) reflects input signal fuzzyflcation operations. The number 
of adding nodes for particular inputs equals the number of fuzzy sets which 
the input changing range is divided into. The level (C) output is the value of 
the membership function of the set Aji for a particular input Xj . Coefficients 
of the input membership function for particular fuzzy sets that are described 
by the Gaussian function 

(2.43) 

are calculated in this layer for each of the inputs. 

Analysing the above formula, it is possible to observe that the weights 
Wc and Wg are parameters that determine the position of the membership 
function in the input space and its shape or, more precisely, its inclination. 

The number of rules in the network is defined by the formula m = 
YVj=i where kj is the number of fuzzy sets for the j-th. input. The in- 
ference process is carried out in the layers (D) and (E). The firing level of 
particular rules is calculated in the layer (D) according to the following for- 
mula: 

n 

n fj-Aiixj) 

n = ^ • (2.44) 

E n fj-Aiixj) 

i=l j=l 

The network output is calculated in the layer (E) according to the formula 

* * * “h dijiXfi. (2.45) 

The weights of the network branches between the layers (D) and (E) 
represent the singleton values yi in the rules (2.42), = Vi> 

Fuzzy modelling has many advantages, one of them being the possibility 
of using the expert’s knowledge as well as measurement data for model con- 
struction in situations when knowledge about the system is incomplete and 
inaccurate. It also allows mapping strongly non-linear relations that may 
exist between the input and output signals. 



2.4. Models applied to fault isolation and 
system state recognition 

Diagnostics deals with the recognition of technical system states. Changes of 
the states are caused by faults, including system wear, etc. State recognition 
consists therefore in showing the existing faults. In some cases the recognition 
of particular faults is not possible or necessary, and it is sufficient to define 
a class of the states in which the system remains. Particular classes of states 
can contain subsets of states with different faults, or system states having a 
similar degree of the degradation of its properties. 
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The following kinds of diagnostic signals are applied as input signals dur- 
ing fault isolation or the system state recognition (classification) process: 

• residuals generated on the grounds of system models, 

• binary or multi- value signals created as a result of residual value eval- 
uation (quantification) , 

• binary or multi- value signals (features) generated with the use of clas- 
sical and heuristic fault detection methods, 

• statistic parameters (features) that describe random signal properties, 

• process variables, i.e., measured or calculated values of physical quan- 
tities. 

Models applied to fault detection or system state recognition should there- 
fore map the space of diagnostic signal values into the discrete space of faults 
or system states (Fig. 2.9). 




Fig. 2.9. Fault isolation using the evaluation of diagnostic signal values 

It is possible to single out the following kinds of models: 

(a) models that map the space of binary diagnostic signals into the space 
of faults or system states, 

(b) models that map the space of multi-value diagnostic signals into the 
space of faults or system states, 

(c) models that map the space of continuous diagnostic signals into the 
space of faults or system states. 

The above models can be defined on the grounds of: training, knowledge 
about the hardware redundant structure, modelling the infiuence of faults on 
residual values, as well as the expert’s knowledge. 

Training data for the state of complete efficiency and for all of the states 
with faults, or at least a definition of the fault states are necessary in the 
case of applying the training procedure. Such data are difficult and often 
impossible to obtain in the case of the diagnostics of industrial processes. 

It is relatively easy to define the relationship that exists between diag- 
nostic signal values and faults in the case of applying the i^-out-of-A^-type 
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hardware redundant structure to the diagnosed system. However, such a so- 
lution is very rarely used due to its high costs. 

If equations for the generation of residuals that contain the effect of faults, 
e.g., the equations (2.5)-(2.8), are known, then it is possible to define residual 
value ranges as well as diagnostic signal values that correspond to residuals for 
the state without faults and for states with faults as a result of the simulation 
of the faults. Sets of diagnostic signal values are obtained for particular faults 
and for the state of complete efficiency. The sets define specific regions in the 
space of diagnostic signals. Such a way of proceeding is very rational, but also 
difficult and labour-consuming. The main difficulty comes from the necessity 
of obtaining a mathematical description of the system that takes the effect 
of the analysed faults into account. 

Another method consists in using an expert’s knowledge. The expert 
should define diagnostic signal values that correspond to particular faults. 
As a result, diagnostic signal space regions that correspond to states with 
single faults and to the state of efficiency are arbitrarily defined. 

2.4.1. Models mapping the space of binary diagnostic signals 
into the space of faults or system states 

Binary diagnostic signals sj G {0, 1} originate as a result of a two- value 
evaluation of residuals or process variable value features. They are also gen- 
erated as a result of implementing tests which consist in controlling limits or 
examining the heuristic relationships existing between process variables. 

Models necessary for fault isolation or system state recognition on the 
basis of binary diagnostic signals realise the following mappings: 

5g{0,1}j=^Fg{0,1}x, (2.46) 

5g {0,l}j^Z G {0,1}/, (2.47) 

where J denotes the number of diagnostic signals, K is the number of faults, 
and I denotes the number of states or state classes. 

It is possible to single out the following models that belong to this group: 
the binary diagnostic matrix, binary trees and diagnostic graphs, if-then 
rules, and logic functions. 

2. 4. 1.1. Binary diagnostic matrix 

The model most often applied is a relation defined on the Cartesian product 
of the sets of faults F — {fk : k = 1,2,. ..,/C} and diagnostic signals 

^ ~ {^j • i ~ 2, . . . , J}: 

Rfs C F X S. (2.48) 

The expression fkRpsSj means that a diagnostic signal sj detects a fault 
fk- In other words, the existence of the fault fk causes the appearance of 
the diagnostic signal sj, whose value equals 1, i.e., it causes the appearance 
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of a symptom. The matrix of relations Rfs is the binary diagnostic ma- 
trix (Gertler, 1998; Koscielny, 2001). An element of the matrix is defined as 
follows: 



r{fk,Sj) ^Vj(fk) = 



0 ^ ^ Rfs, 

1 ifk,Sj) e Rfs- 



(2.49) 



The relation Rfs can be defined by attributing to each diagnostic signal 
a subset of faults F{sj) that are detected by this signal: 



F{sj) = {heF: fkRFSSj}. (2.50) 



The relation can also be defined by attributing to each fault fk^Fa, subset 
of diagnostic signals S{fk) that detect this fault fk'- 

S{fk) = {sjeS: fkRpsSj}^ (2.51) 

where the set S{fk) defines the set of symptoms that correspond to the A;-th 
fault. 

Let us also define a fault signature as a vector of diagnostic signal values 
that correspond to this fault: 



Vlifk) 



V{fk) = 



Mfk) 



(2.52) 



L vj(fk) J 



where Vj{fk) G {0, 1}. If the fault signatures are identical, then the faults are 
indistinguishable . 

The binary diagnostic matrix can be presented in the form of a graph 
Gfs^ whose set of vertices contains two sets F and 5, and whose set of 
arms shows the relations that exist between them: 



Gfs — {F,S, Rfs)- (2.53) 

The binary diagnostic matrix can be defined on the grounds of equations 
of residuals that take the effect of faults into account. For example, for the 
three-tank set shown in Fig. 1.3 it is possible to define the sensibility of 
residuals to particular faults on the basis of the equations (1.24)-(1.27). The 
relationship can be written in the following form: 



n = ri(/l,/5,/6,/7,/8), 


(2.54) 


r2 = »’2(/l,/2,/3,/9,/l2), 


(2.55) 


J’s = r3{f2, fs, f4, fd, fio, fis), 


(2.56) 


Ti = r4(/3,/4,/io,/ll,/l4)- 


(2.57) 
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The above formulae define the subsets of faults (2.50) detected by particular 
diagnostic signals: 

F{s,) = {fuhJej7js}, (2.58) 

F(52) = {/i,/2,/3,/9,/i2}, (2.59) 

F{s3) = {/2,/3,/4,/9,/i0,/i3}, (2.60) 

n^4) = {/3,/4,/l0,/ll,/l4}. (2.61) 

The binary diagnostic matrix that corresponds to the above sets is pre- 
sented in Table 2.1. The matrix can also be defined using an expert’s knowl- 
edge by analysing the infiuence of particular faults on diagnostic signal values 
(Koscielny, 2001). 



Table 2.1. Binary diagnostic matrix for a three-tank set 
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2. 4. 1.2. Diagnostic trees and graphs 

The relationship that exists between faults and diagnostic signal values can 
be presented in the form of a binary tree that defines the method of diag- 
nostic inference. The tree vertices correspond to diagnostic signals (tests). 
Out of each of the vertices there come out two branches corresponding to 
two values of the signal, i.e., the positive and the negative result of the test. 
A signal having a value that is analysed as the first one is the root of the 
tree. Vertices hanging in the tree correspond to diagnoses. An example of a 
binary diagnostic tree for a three-tank set is shown in Fig. 2.10. Such a tree 
can be defined using the binary diagnostic matrix, or directly on the basis of 
an expert’s knowledge. Many different diagnostic trees can be derived from 
the binary diagnostic matrix. They differ as far as the order of the analysis 
of particular diagnostic signals is concerned. However, the number of hang- 
ing vertices and diagnoses attributed to the vertices is identical for all of the 
trees. The AND/OR/NOT graphs (Lig^za and Fuster-Parra, 1997) applied to 
abduction inferences are yet another method of diagnostic relation notation. 
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2. 4. 1.3. Rules and logic functions 

The relationship existing between faults and binary diagnostic signal values 
can be defined in the form of rules of the following types: 

if (^si == 0) and . . . and {sj = 1) and . . . 

and {sj — 1) then fault fk, (2.62) 

if Sj = 1 then fault fa or ... or fk or fn- (2.63) 

Let us notice that each column of the binary diagnostic matrix defines 
a (2.62)-type rule, and the rows (Table 2.1) correspond to rules of the 
form (2.63). Rules of the form (2.62) are the basis for inference in fuzzy 
diagnostic systems. The difference is that the fuzzy two- value evaluation of 
residuals is applied instead of the threshold technique. 

The logic function is the simplest possible relationship that exists between 
symptoms and faults. Binary diagnostic signals act as input signals, and the 
binary output signal shows the state of a particular fault, i.e., its existence 
or absence. In a general case, such a function takes the following form: 

z{fk) - K V • • • V 55 ] A V • • • V A • • • A V • • • V 5^]. (2.64) 

A simple conjunction of diagnostic signals that have the form 

Sj A • • • A Sfi (2.65) 



is most often applied. 
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2.4.2. Models mapping the space of multi-value diagnostic signals 
into the space of faults or system states 

Multi- value diagnostic signals sj G Vj appear as a result of residual value or 
signal feature quantisation. They can also result from process variable limit 
control with the application of several limiting values. It is assumed that a 
different set of values Vj can correspond to each one of the diagnostic signals. 

Models necessary for fault isolation or system state recognition using 
multi- value diagnostic signals realise the following mapping: 



5 G X . . • X X • • • X T/j ^ F E {0, 1 }k^ (2.66) 

5 G Fi X . . . X X . . • X Fj Z G {0, 1}/, (2.67) 

where J denotes the number of diagnostic signals, K is the number of faults, 
and I denotes the number of states or state classes. 

It is possible to single out the following models belonging to this group: 
information systems, diagnostic trees and graphs, and if-then rules. 

2. 4. 2.1. Information system 

Information systems are vital elements of rough set theory, developed by 
Pawlak (1983) in the early 1980s. The information system is defined as 

IS = {X,A,Vs,r), ( 2 . 68 ) 

where X denotes a finite set of systems, A is a finite set of attributes, 
— [JaeA^o, is the set of the attribute values, Va denotes the domain of 
the attribute a, and r is the complete function defined as follows: 

r: XxA-^Vs, (2.69) 

where r(a;, a) G Va for each x E X and a E A. 

The requirement that the function be complete means that it is defined 
for all values of x and a. The above definition describes a simple informa- 
tion system in which each one of the attributes can have only one possible 
value for a given system. The approximate information system has one- value 
attributes, but it is assumed that the attribute value for a given system is 
not precisely known but it belongs to a certain subset. Function r is defined 
in this case as follows: 

r: XxA^^iVs), (2.70) 

where r{x, a) C Va for each x E X and a G A. 

Each function with arguments belonging to the set of attributes A and 
with values belonging to the set V : (f{a) G K is information in such a system. 
Information is written as the set of pairs of an attribute-attribute value: 

{{ai,v G Vi),{a2,v G V2), . . . , {an, v G Pn)}- 



(2.71) 
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An information system in which all pieces of information are non-empty is 
called a complete information system. 

The above definitions of the information system and the approximate 
information system are very useful for the description of relations existing 
between faults and diagnostic signals. Let us define a Fault Isolation System 
(FIS) (Koscielny, 2001) as the approximate information system. Let us assume 
that the set of objects is identical with the set of faults: 

X = F^{h: ^ = 1,2,...,^}, (2.72) 

that diagnostic signals create the set of attributes: 

A = S^{sf. (2.73) 

and that the set of attribute values Vs is the sum of all diagnostic signal 
values: 

Vs = [j Vj. (2.74) 

sj es 

The function r is defined in this case on the Cartesian set F x 5, r* : F x 5 — > 
^(Ks)- It maps to each pair of a fault-diagnostic signal (/, s) the value (in the 
simple system) or values (in the approximate system) of the signal appearing 
with a given fault, respectively: 

r{fk,Sj) =Vkj CVj. (2.75) 

'^ifkjSj) = Vkj = {vji G Vj} C Vj. (2.76) 

Such an FIS, being the adaptation of an information system to the needs 
of fault isolation, is defined as follows: 

FIS-(F,5,y5,r). (2.77) 

Therefore, the FIS is a table that defines diagnostic signal pattern values for 
particular faults. Table 2.2 presents a general shape of such a fault isolation 
system. The FIS is a generalisation of the binary diagnostic matrix. If the set 
of all of the diagnostic signal values is identical and equals Vs = {0, 1}, then 
the FIS is a generalisation of the binary diagnostic matrix. Vital expansions 
of the FIS in comparison with the binary diagnostic matrix are as follows: 

(a) an individual set of diagnostic signal values can exist for each one of 
the diagnostic signals; 

(b) the set of the j-th diagnostic signal values (the domain of any attribute) 
can be a multi- value one; 

(c) any element of the FIS can contain either one diagnostic signal value 
or a subset of values. 
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Table 2.2. FIS - the approximate information system for fault isolation 
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Giving the subset of values for a given pair of a fault-diagnostic signal, 
the uncertainty of the relation can be taken into account. It is allowable for 
particular diagnostic signals to have one of the values shown in a particular 
field of Tab. 2.2 in the case of the appearance of a given fault. 

The signature of the A:-th fault corresponds to a column of the FIS table, 
and it is a generalisation of the signature expressed by the equation (2.52). 
It is defined by the following formula: 






V{fk) = 



Vk2 



(2.78) 



L Vu J 

where Vkj is defined by the equation (2.76) and denotes the subset of possible 
values of the j-th. diagnostic signal in the case of the A:-th fault. 

The set of all diagnostic signal values is information (data) about faults 
in the FIS table: 



{(5i,u G Vi),{s2,v G V2),...,{sj,v G Vj)]. (2.79) 

In order to maintain the analogy with notations used in the preceding chap- 
ters, let us write information in the diagnostic system as a vector of all diag- 
nostic signal values: 
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(2.80) 
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Each piece of information determines a certain set of faults such that their 
signatures are consistent with this piece of information (diagnostic signal 
values): 

F(V) = {fkeF: Ws,v{sj)eVkj}. (2.81) 

The FIS is an incomplete information system since there exist combi- 
nations of diagnostic signal values to which no faults correspond. Table 2.3 
presents a simple example of an FIS. 



Table 2.3. Example of an FIS 
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2. 4. 2. 2. Other models 

Mapping the space of multi- value diagnostic signals into the space of faults or 
system states can also be realised in the form of a tree or if-then-type rules. 
Diagnostic trees are a generalisation of binary trees (Fig. 2.10). The number 
of branches coming out of each node that corresponds to a diagnostic signal 
equals the number of values that each one of the signals can have. 

The relation existing between faults and diagnostic signal values can be 
defined in the form of rules that most often have the following form: 

if (^si — Vki) and ... and [sj—Vkj) 

and ... and [sj — Vkj) then fault /^. (2.82) 

Such rules also correspond to the columns of the simple FIS. In the case of the 
approximate information system, rules that correspond to particular faults 
are as follows: 



if (si G Vki) and . . . and {sj G Vkj) 

and ... and (sjeVkj) then fault /fc. (2.83) 

As in the case of the binary diagnostic matrix, rules corresponding to FIS 
rows can be defined. It is possible to obtain for the simple FIS 

if Sj = Vkj then fault /« or ... or fk or fn. 



(2.84) 
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Rules having the form (2.82) and (2.83) are applied to fuzzy diagnostic sys- 
tems, including fuzzy neural networks. The fuzzy multi- value evaluation of 
residuals is applied in this case instead of the threshold evaluation. The fuzzy 
systems for fault isolation are discussed in greater detail in Chapter 11. 

2.4.3. Models mapping the space of continuous diagnostic signals 
into the space of faults or system states 

Residuals generated on the basis of system models as well as process variables 
are continuous diagnostic signals. Models applied to fault isolation or system 
state recognition realise the following mappings: 

SeRj=^Fe{0,l}K, (2.85) 

S eRj=>Z G{0,1}/. (2.86) 

The defined pattern regions (pictures) correspond to particular faults or sys- 
tem states in the space of continuous diagnostic signals. Classification meth- 
ods are applied to the modelling of such dependences and to fault isolation or 
system state recognition, e.g., classic methods of pattern recognition, neural 
networks, and neural fuzzy networks. 

2. 4. 3.1. Pattern pictures 

The construction of a model for fault isolation or state recognition consists 
therefore in defining in the space of diagnostic signals regions which constitute 
pattern pictures of faults or states. Some examples of regions that correspond 
to faults in the two-dimensional space of diagnostic signals are shown in 
Fig. 2.11. Pattern regions can be defined in different ways. It is possible to 
single out geometrical, polynomial and statistic classifiers (Tadeusiewicz and 
Flasihski, 1991). Pattern recognition methods are discussed in Chapter 14. 



■> 






Fig. 2.11. Regions corresponding to faults in 
the two-dimensional space of residuals 
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Pattern recognition can be obtained during the process of training. In 
order to achieve this, it is necessary to possess training data for all of the 
faults or system states. However, obtaining training data for such states is 
extremely difficult, and for many industrial systems even impossible. The 
data can be obtained by fault simulation with the use of the system analytical 
model that takes the effect of faults into account. 

2. 4. 3. 2. Neural networks 

Pattern recognition for particular faults or system states can be mapped by a 
neural network. In early papers (Hoskins and Himmelblau, 1988; Kramer and 
Leonard, 1990; Sorsa and Koivo, 1993), one-dimensional multi-layer percep- 
tion networks shown in Fig. 2.4 were applied. Self-organising Kohonen-type 
networks were also used (Sorsa and Koivo, 1993). Residuals generated on the 
grounds of system models, signal features or process variables act as network 
inputs. Beside classifiers that have the form of one neural network, multi- 
network classifiers are also applied (Marciniak and Korbicz, 2001). Neural 
classifiers are discussed in Chapter 9. 

2. 4. 3. 3. Fuzzy neural networks 

Fuzzy neural networks applied to fault isolation realise the fuzzy evaluation 
of residual values as well as diagnostic inference (Koscielny, 2001; Syfert and 
Koscielny, 2001). The structure of a fuzzy neural network applied to fault 
isolation differs from the structures used for system modelling. It contains no 
layer in which defuzzyfication is carried out. The number of network outputs 
equals the number of distinguished faults or system states. 

Network weights can be defined during the process of training, and a 
fuzzy neural network is not a black box. Its structure corresponds to the set 
of realised rules. Therefore, the structure can also be defined on the grounds 
of an expert’s knowledge. This method is applied to industrial systems, for 
which it is not possible to obtain training data sequences for states with 
faults. The application of fuzzy neural networks to fault isolation is discussed 
in Chapter 11. 



2,5. Summary 

Basic groups of models applied to technical diagnostics have been discussed 
in this chapter. Models used for fault detection as well as those applied to 
fault isolation or system state recognition have been singled out. The most 
important kinds of analytical, neural, and fuzzy models used for residual 
generation have been presented. In the group of models that describe the 
relation existing between diagnostic signals and faults, the following models 
have been discussed, among others: the binary diagnostic matrix, diagnostic 
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graphs, logic rules and functions, the information system, pattern pictures of 
faults (system states) in the space of residuals, and neural networks. 

The diversity of models used in the diagnostics of processes (systems) re- 
flects different degrees of knowledge about the diagnosed system as well as 
different methods of obtaining the knowledge. In the case of models applied 
to fault detection, the model structure is defined using physical equations, an 
expert’s knowledge (e.g., fuzzy models), or a search (e.g., neural networks). 
Identification (training) methods using experimental data are used for defin- 
ing model parameters. 

Models applied to fault isolation or system state recognition can be de- 
fined on the basis of: training, knowledge of the hardware redundant struc- 
ture, modelling the effect of faults on residual values, as well as an expert’s 
knowledge. The method of obtaining the knowledge depends on the diag- 
nosed system’s specificity. For instance, for unique, one-of-the kind systems 
such as chemical plants, the collection of training data for states with faults 
is not possible. Particular faults appear rarely, their number is very high, 
and the diagnostic system should recognise their first appearances. For com- 
plex chemical plants, it is very difficult to work out analytical models that 
take into account the effect of faults on residual values. The application of 
an expert’s knowledge remains therefore the only method. However, in the 
case of turbines, physical models and very complex analytical models are 
constructed. Training data for states with faults can be obtained on the basis 
of examinations of physical or analytical models. For serially manufactured 
systems, it is possible to collect data from measurements carried out in states 
with faults. In such a case, system examinations that consist in an artificial 
introduction of faults, and even examinations that destroy the system are 
often applied. Expenditures for the elaboration of precise analytical models 
are also justified. 
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Chapter 3 



PROCESS DIAGNOSTICS METHODOLOGY 



Jan Made] KOSCIELNY* 



3.1. Introduction 

This chapter is an introduction to the problems of the diagnostics of pro- 
cesses or systems. A general methodology of system diagnostics is presented 
by the description of such vital elements as fault detection, isolation and 
identification as well as the monitoring of system states. While diagnosing 
the system, it is necessary to ensure adequate distinguishability of its faults 
or states. Problems associated with the evaluation of faults (states) distin- 
guishability as well as the methods of choosing a detection algorithm set that 
ensures higher distinguishability are presented. The chapter should give the 
reader some understanding of different diagnostic methods and create a base 
for studying particular problems being the subject matter of the following 
chapters. 



3.2. Fault detection 

Fault detection is the process of generating diagnostic signals 5 on the 
grounds of process variables X in order to detect faults. Detection algorithms 
should therefore extract symptoms. Diagnostic signals ought to contain in- 
formation on faults. The mapping of the space of process variables X into 
the space of diagnostic signals S as well as the evaluation of these signals in 
order to detect and signal fault symptoms takes place during the detection. 
In the diagnostics of processes, fault detection is automatically realised by a 
diagnosing computer. 
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Fault detection methods can be divided into two general groups: methods 
applying the relationships existing between process variables, and methods 
based on the control of process variable parameters. The methods based on 
the relationships between process variables require possessing knowledge on 
the system having the shape of quantity or quality models. Analytic, neu- 
ral as well as fuzzy models are used for detection. Diagnostic signals are 
generated also on the basis checking simple relationships existing between 
process variables, such as: the hardware redundancy of measuring lines, feed- 
back signal control, the control of consistency of signal change directions, etc. 
(Koscielny, 1991; 2001). The knowledge of such dependencies is possessed by 
automatic control engineers and process operators, and algorithms are easy 
to implement. 

In the methods belonging to the second group, fault symptoms are de- 
tected exclusively on the grounds of the analysis and evaluation of the changes 
of one process variable. The elements that are controlled are usually limits 
(boundaries of credibility, an acceptable rate of changes) of particular vari- 
ables, or there is implemented a statistic or spectral analysis of variables 
that should detect changes denoted as fault symptoms. Such methods are 
relatively simple since they do not require possessing knowledge that has the 
shape of process models. Their disadvantages result from the limited vol- 
ume of diagnostic information carried by a single signal, as well as the high 
number and diversity of the meaning of causes of signal parameter changes, 
which makes the definition of the relationships existing between symptoms 
and faults rather difficult. 



3.2.1. Fault detection using system models 

The most advanced fault detection methods use the models of systems for 
generating residuals. A detection algorithm diagram is shown in Fig. 3.1. The 
algorithm of the test consists of two parts. In the first one, the residual value 
is calculated on the grounds of a model of the system. In the second one, the 
value is defined and a diagnostic signal is generated. 
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Fig. 3.1. Diagnostic algorithm diagram applying a system model 

Different kinds of models are applied: analytic, neural and fuzzy ones. 
The residual is calculated as: 

• a difference between the measured value of a process variable and its 
value calculated on the basis of the model. 
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• a difference between the left-hand side and the right-hand side of the 
equation that describes the system, 

• a difference between the nominal and estimated values of a parameter 
of the model. 

Within the group of analytical methods applied to fault detection one 
can distinguish: 

• detection with the use of physical models (e.g., balances, movement 
equations, etc.), 

• detection with the use of linear input-output-type models, 

• detection with the use of state observers or Kalman filters, 

• detection on the grounds of on-line identification. 

Residual generation methods based on analytical, neural and fuzzy mod- 
els are described below. Algorithms for the evaluation of the residual value 
and for making decisions on fault detection are also presented. 

3. 2. 1.1. Generation of residuals on the grounds of physical equations 

Physical models are often non-linear, implicit with respect to the set of output 
signals. Let us look at the equations (1.10) to (1.12) written for the set of 
three tanks. In such cases, the residuals (2.6) to (2.8) are generated as a 
difference between the left-hand side and the right-hand side of the equation 
describing phenomena occurring in the system, 

L{y,u,t) = P{y,u,t). (3.1) 

A diagram for generating residuals is shown in Fig. 3.2. A complete model 
of the system can be obtained directly from physical equations, e.g., balance 
ones. Such a model reflects the static and dynamic properties of the system 
in the whole range of operation, while linear models can be used only in the 
neighbourhood of the nominal point of operation, for which the identifica- 
tion of their parameters has been carried out. The generation of residuals on 




Fig. 3.2. Diagram for residual generation (physical equations) 
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the grounds of such - usually non-linear - models is the most reliable detec- 
tion method, provided the model is adequately accurate. The elaboration of 
models on the grounds of physical equations is extremely difficult or outright 
impossible for many systems, and parameter identification has its own, addi- 
tional difficulties. Thus, the application of this method is limited to systems 
that are described by relatively simple equations. 



3. 2. 1.2. Generation of residuals on the grounds of system transmittance 

Linear input-output- type models are used for generating residuals. They have 
the shape of continuous or discrete transmittances calculated from linear 
differential or difference equations. The equation on the grounds of which the 
residual is generated is called the parity equation. 

Dynamic parity equations calculated from transmittance models were 
developed in the 1980s by Gertler and his co-workers (Gertler, 1991; Gertler 
and Singer, 1990). A summary of investigations in this field can be found in 
(Gertler, 1998). Primary residuals are calculated in one of the following two 
ways: 



ri{s) = yi{s) - Gii{s)ui{s) - Gi2{s)u2{s) Gip{s)up{s), (3.2) 

fi = Mi{s)yi{s) - La{s)ui{s) - Li2{s)u2{s) Lip(s)up(s). (3.3) 

The number of residuals equals the number of output signals. The sets of 
parity equations are defined by one of the two formulas: 

r(s) = y(s) - G{s)u{s), (3.4) 

f{s) = M(s)y{s) — L{s)u{s). (3.5) 



The above equations define the procedure for generating primary residuals in 
the computational form. Figures 3.3 and 3.4 show diagrams for calculating 
residuals corresponding with the formulas (3.4) and (3.5). 

Additional residuals can be obtained by multiplying primary residuals by 
adequately chosen transforms (polynomials or rational functions of a variable 
s for continuous models or a variable z for discrete ones): 

r*{s) = V(s)r(s) = V(s)[y(s) - G(s)u(s)], (3.6) 

f*{s) = W(s)f(s) = T^(s) [M(s)y(s) - L(s)u(s)] . (3.7) 

The transforms V or W are chosen in a way that ensures the sensitivity 
of particular residuals only to specific faults and their insensivity to other 
faults as well as disturbances. 
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Faults Disturbances 




Fig. 3.3. Diagram of residual generation (the equations (3.4)) 



Faults Disturbances 




Fig. 3.4. Diagram of residual generation (the equations (3.5)) 



Methods that use linear models of the system having the form of trans- 
mittances for the generation of residuals allow an early detection of even 
small parametric faults. It is obtained, however, at the cost of the necessity 
to define sufficiently precise models, which is often very difficult. Residuals 
have to be adequately sensitive to faults but, on the other hand, they should 
be sufficiently insensitive to other changes such as natural disturbances ex- 
isting in the process, measurement noise or modelling errors. Linear models 
describe the properties of the system in the neighbourhood of the point of 
operation, so each change of this point can cause - just like faults - the 
occurrence of residual values different than zero. 
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3. 2. 1.3. Generation of residuals using state equations 

Parity equations are calculated also from equations of state. Such a method 
of residual generation for linear dynamic systems was designed by Chow and 
Willsky (1984), and further developed by Lou et al (1986). Descriptions of 
the method can also be found in the review papers (Prank, 1990; Gertler, 
1991; Patton and Chen, 1991). Since the obtained parity equations contain 
the values of the input and output signals from current and previous moments 
of time, this kind of redundancy is also called time redundancy. 

It is assumed in the method that the mathematical description of the 
system having the shape of the discrete equations of state (2.11) to (2.12) 
is known. If one introduces the equation of state (2.23) to the equation of 
outputs for {k l)-th moment of time: 

y{k + 1) = Cx{k + 1) + Du{k + 1), (3.8) 

we will obtain 

y(k + 1) == CAx{k) -f CBu{k) + Du{k + 1). (3.9) 

Similarly, for moments of time higher by r when r > 1, one can write 

y(k -hr) = CA^x{k) + CA^-^Bu{k) + • • • + CBu{k + r - 1) 

-hDu{k-hr). (3.10) 

Introducing A:' = A: + 1, it is possible to transform the equations of outputs 
in such a way that they correspond to a given sampling moment of time k' 
as well as the preceding moments. If one can give up writing an index with 
the changed time scale A;, the equations assume the form 

Y{k) = Rx{k - r) + QU(k), (3.11) 



where 
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For the system described by the following parameters: n (state co- 
ordinates), p (inputs) and q (outputs), the dimensions of the vectors y, 
t/, and the matrices R, Q are as follows: 



(3.12) 



Y{k) ^ (r -h 1) X U{k) (r -h 1) x p, 

R [{r 1) X q] X n, Q -> [(r -h 1) x g] x [(r -h 1) x p] . 

Since the aim is to obtain the relationship between the input and output 
signals within the [k, fc — 1] period of time, one should eliminate state co- 
ordinates out of the equation (3.11). In order to do so, this equation should 
be multiplied by the vector of dimensions (r -I- 1) x q. It is possible to 
obtain a scalar equation: 

w^Y{k) ^ w^Rx{k - r) -h w'^QU{k). (3.13) 

The condition under which the state co-ordinates are eliminated is as follows: 

w^R = 0. (3.14) 

If the system is an observable one, the obtained equations are indepen- 
dent. The vector can be chosen freely as long as the condition (3.14) 
is fulfilled. Different forms of the vector as well as different shapes of the 
obtained parity equations are therefore possible. Residuals can be obtained 
by subtracting the left-hand side of the equation from the right-hand side: 
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The residual vector dimension equals (g x 1). The matrix can always be 
obtained for a high r. The minimum value of the time window [y(k)—y{k—r)] 
equals the dimension of the vector of state n if the system is an observable 
one, otherwise the minimum value is the degree of an observable part of the 
pair (C, A) (Massuomnia and Van der Velde, 1988). 

The method is equivalent to the method of residual generation on the 
grounds of system transmittance, and possesses all advantages and limits 
associated with the application of linear models. 
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3. 2. 1.4. Generation of residuals on the grounds of state observers 

Methods of generating residuals based on Luenberger observers have been 
developed mainly by Clark, Frank and Patton with their co-workers. As more 
important papers from this period of time one can mention- (Clark, 1978; 
Prank, 1987; 1990; 1991; 1992; Frank and Keller, 1980; 1984; Patton and 
Chen, 1991; 1993; Patton et a/., 1989). 

Output signals estimated by the observer are compared with real signals, 
and the differences are residuals (Fig. 3.5). The application of observers to 
the generation of residuals differs therefore from typical observer applications 
to automatic control, where their task is to re-create immeasurable state 
variables, not output signals. 

In the classical example of system model application to residual gener- 
ation, the system output is calculated on the grounds of the input signals. 
Beside the input signals, measurable outputs are also used in the observer 
for estimating the outputs. The idea behind the observer is the application of 
feedback from the difference between the estimated and real outputs of the 
system to the model improvement by an adequately chosen feedback matrix 
H. Feedback is necessary in order to compensate different initial conditions 
as well as to stabilise the observer in the case of unstable systems. The ob- 
server ensures estimation error convergence to zero for any initial conditions. 




Fig. 3.5. Fault detection with the use of a state 
observer, / - faults, d - disturbances 



The output estimation error, i.e., the residual 

r{k) =y{k) -y{k), (3.16) 

can be obtained by subtracting (2.21) from the equation of outputs for the 
system without faults (2.14). The residual 



r{k) = Ce{k) - Ay 



(3.17) 








3. Process diagnostics methodology 



67 



depends on the state estimation error e{k) = x{k) — x{k)^ which can be 
obtained by subtracting (2.20) from (2.15), respectively: 

e{k + 1) - (A - HC)e{k) - Ed{k) - Ff{k). (3.18) 

It is possible to notice that in the case of a lack of errors or disturbances 
(they are zero) and when the matrix {A — HC) has left-hand side eigenval- 
ues, the estimation errors e{k) and r{k) tend toward zero after any initial 
state tracking error has vanished. However, the manifestation of any fault or 
disturbance leads to the occurrence of values different than zero. Such a piece 
of information becomes the basis for fault detection. 

3. 2. 1.5. Generation of residuals using on-line identification 

Faults appear not only as changes of system outputs values but also as 
changes of the physical coefficients p appearing in equations of movement, 
such as resistances, capacitances, rigidities, etc. Such physical coefficients are 
contained in the parameters 0 of a system model. If one defines the values of 
the coefficients on the grounds of the identification of the system model im- 
plemented in real time, and compares them with nominal values, i.e., system 
parameter values in the state of complete efficiency, the obtained differences 
are residuals that contain information on faults. Such a detection method 
was suggested and developed by Isermann (1984; 1991) and his co-workers 
(Geiger, 1985; Goedecke, 1986). 

The parameters of a system model are understood as constants that 
appear in the mathematical description of the relations existing between the 
input and the output signals of the system. One can distinguish static models 
of the system, described by 



y — + ^i^i(^) + + • • * 5 (3.19) 

where = 1? ^i(^) is a known function of the input vector (e.g., ^i{u) = 
uiU 2 , ^i{u) = ul) as well as dynamic models, given by differential equations 
linearised around the point of operation: 



d^y d^-^y 

dt- + dt«-i 



+ • • • -h Ul 



% 

dt 



+ Oo2/ 



di" 



+ h 



m—l 



d™-iu 

df”*-! 



+ ■ 



, du 

•f oi — + box, 



n>m. (3.20) 



The parameters of a system model 6^ — [/3o,iSi,/?2, • • •], (3.19) or 6^ = 
[un-i, . . . , Ul, uo; &m-i 5 • • • , , &o] (3.20) are more or less complicated rela- 

tionships existing most often between several physical coefficients. In order 
to describe system coefficients and their changes, a procedure was presented 
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by Isermann (1984) that comprises the following stages: 

1. The definition of a system model for measurable input and output signals: 

y = f{u,d) (3.21) 

with the help of theoretical modelling. 

2. The definition of the relationships existing between the system parameters 
6i and the physical coefficients py, 

0 = f{p). (3.22) 

3. The estimation of the system model’s parameters 0 on the grounds of 
measured input signals u and output signals y, 

4. The calculation of physical coefficients: 

p = r\e). (3.23) 

5. The definition of residuals ri as physical coefficient value changes 

i.e., differences between their nominal values (for the complete efficiency 
of the system) and current values: 

r — Ap = Pn —p. (3.24) 

6. The decision on the occurrence of a fault on the grounds of the evaluation 
of physical coefficient value changes. 

A fault detection diagram based on such an on-line identification of the 
system model’s parameters is shown in Fig. 3.6. If the method is to lead 
to good results, it is necessary to obtain an adequate model of the system 
with the help of theoretical modelling (on the grounds of physical and chem- 
ical laws) as well as the realisation of a reliable identification of the system 
model’s parameters. Identification requires an adequate excitation of the sys- 
tem, so that measurement data would cover the whole range of its operation. 
As a disadvantage of the method one can see high calculation expenditures 
associated with the necessity of a real-time system model’s parameters iden- 
tification, as well as problems with the detection of additive faults. 

3. 2. 1.6. Residual generation with neural and fuzzy models 

Neural and fuzzy models described in Chapter 2 allow calculating the values 
of system variables. Residuals are generated according to the diagram pre- 
sented in Fig. 3.7. A vital advantage of fuzzy and neural techniques is the 
possibility of non-linear system modelling. Models of systems in the state of 
complete efficiency are obtained on the grounds of experimental data with 
the use of different training techniques. This is especially important when an- 
alytical models of the system are not known. Such models reflect well system 
operation within the signal changes range on the grounds of which they were 
trained. 
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Fig. 3.6. Fault detection using on-line identification 




Fig. 3.7. Diagram of residual generation using fuzzy models 



In automatised industrial processes, both current measurement data and 
the values of archivised process variables are available. This means that there 
exists a convenient situation for model building on the grounds of the mea- 
sured data from the system as well as an expert’s knowledge about the rela- 
tionships existing between the variables (the structure of the model). At the 
same time, the rapid development of computer techniques eradicated the vi- 
tal barrier connected with high calculation expenditures for fuzzy and neural 
model tuning with the use of large data sets. 

Neural networks are a black box. The elements of the structure as well as 
the weights have no physical relation to the system structure and parameters. 
The expert’s knowledge can be used only to define the set of input signals 
needed for given output signal modelling. On the other hand, as a vital ad- 
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vantage of neural networks one can see high robustness to disturbances and 
the possibility of generalising the knowledge contained in the network. 

The advantage of fuzzy and neural networks is the possibility to connect 
the expert’s knowledge with the available measurement data. The expert’s 
knowledge is used to define the structure and initial values of the model’s 
parameters. The model is not a black box. It is a set of rules that can be 
interpreted and verified by the expert. The number of rules in fuzzy models 
grows rapidly with the growth of the number of inputs and the number of 
fuzzy sets for particular inputs. This limits their application to relatively 
simple systems. 



3. 2. 1.7. Algorithms for making a decision on fault detection 
using residual value evaluation 



Every fault detection algorithm that makes use of an analytical, fuzzy or 
neural model (Fig. 3.1) contains the decision part, in which the evaluation 
of the residual value takes place, and the decision about the detection of a 
fault is made together with a possible indication of this event in the form of 
an alarm. 

The simplest decision algorithm is the comparison of the absolute resid- 
ual value with its threshold value. The diagnostic signal sj takes the value 
of one (the fault symptom is detected) if the threshold value K has been 
exceeded: 



0 if \rj\ < 

1 if |r,-l > K. 



(3.25) 



In order to increase the insensivity of detection to the effect of electro- 
magnetic disturbance pulses that act on the measured signals, one should 
make the decision not on the grounds of an instantaneous value of the resid- 
ual but on the basis of its mean value in a moving window that contains 
N last values of the residual. The following algorithm can be shown as an 
example: 



s{rj) = < 



0 if fj{N) = — 

1 if fj{N) = ^ 



N-1 

n=0 

N-1 

n=0 



<K, 

>K. 



(3.26) 



The threshold value K in the equations (3.25) and (3.26) can be assumed 
freely on the grounds of operational experience, or calculated with the use 
of statistical data that characterise residual value changes in the state of the 
normal operation. 

The mean value of the residual should equal zero when the system’s 
model is an adequate one. The residual value changes are caused by the 
effect of measurement noise with the absence of faults and disturbances. One 
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can assume that the noise has a normal distribution with the expected value 
equal zero, and can be described as follows: 



f{r) = 




^2 



(3.27) 



where is the residual variance. 

Knowing the residual value distribution, one can measure the threshold 
value K assuming a certain probability level of false alarms The param- 
eters are connected with each other as follows: 

K 

j f{r) dr = 1 - (3.28) 

-K 



One can calculate the threshold value K on the grounds of tables for the 
normal distribution (tables of values for the Laplace function) 

A similar procedure can be used for calculating the threshold value for 
the mean value of the residual in the moving window. In such a case, one 
should know the mean value distribution: 



f{r) = 




-2 



(3.29) 



i.e., one should define the normal distribution variance (the mean value equals 
the residual mean value), and calculate the threshold value assuming the 
probability of false alarms 



K 

j f{f)df = !-<;. (3.30) 

-K 

Statistic methods for residual value evaluation were described in detail in the 
monographs by Basseville and Nikiforov (1993), as well as Gertler (1998). 

Beside the binary evaluation, the multi-value one is also applied, espe- 
cially the three- value evaluation: 

0 iirj£[-K,Kl 

-1 if rj < -K, (3.31) 

-1-1 if rj > K. 

The residual evaluation can also be carried out with the use of fuzzy 
logic. The fuzzy evaluation of residual values makes it possible to take into 
account the uncertainty of diagnostic signal values due to disturbances in 
the system, measurement noise, modelling errors, and difficulties with the 
definition of threshold values. 



4'Tj) = < 
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3.2.2. Fault detection using tests of simple relationships 
existing between signals 

3. 2. 2.1. Application of hardware redundancy 

The application of hardware redundancy, i.e., of two or more devices car- 
rying out the same task makes it possible to compare their operation and 
fault detection in the case of inconsistency. In automatic control systems, the 
redundancy of measuring lines is most often applied. The application of two 
independent measuring lines for the same physical quantity facilitates the 
detection of faults in these lines. Measuring signal inconsistency is a fault 
symptom for any of the lines. The residual value calculated as the difference 
between the signals is the measure of the inconsistency: 

r = yi-y 2 - (3.32) 

In the case of simple redundancy, it is impossible to decide which of the lines 
was damaged. Fault isolation is possible, however, with the help of the K- 
out-of-A^-type redundancy. The application of hardware redundancy is a very 
efficient, although costly method of fault detection. This can change with the 
development of modern multi-sensor measurement transducers. 

3. 2. 2. 2. Application of feedback signals 

Feedback signals are universally applied to fault detection in automatic con- 
trol systems. The comparison of these signals with control signals facilitates 
fault detection in lines that transmit control signals. As examples of the ap- 
plication of feedback signals one can mention: 

• the test of analogue/binary control lines in controlling devices by sending 
the output signal to analogue/binary inputs, 

• the test of the pump operation control line by the measurement of the 
pump operation state, 

• the test of the control signal line from a computer by the measurement 
of the servomotor’s piston rod position. 

The method is simple and efficient. However, it can be applied only to par- 
ticular parts of control systems. 

3. 2. 2. 3. Test of statistical relationships existing between process variables 

Statistical relationships existing between the time sequences of process vari- 
ables can be used for fault detection. The stationariness and ergodictness of 
stochastic processes is assumed. The interdependence of time sequences of the 
process variables y and 2; is defined by the function of mutual correlation: 

1 ^ 

Ryz{m) = 2AT + 1 ^ (3-33) 

i=—N 
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as well as the co-variance function: 

1 ^ 

+ l [2/W-2/][^(*+w)-^]=i?y^(m)-p. (3.34) 

i=—N 

Changes of the values of these functions contain information on possible 
faults. 



3. 2. 2.4. Testing the relations existing between process variables 

Even if quantity mathematical models that connect process variables are not 
known in industrial processes, some relationships existing between process 
variables that are fulfilled in the states of the complete efficiency of systems 
can usually be defined. They are known to system operators. They result 
both from physical laws and from the technology of production. The di- 
agnostic signal is obtained on the grounds of tests of quality relationships 
existing between process variables (Fig. 3.8). The relations existing between 
the variables’ values, the consistency between the directions of their changes, 
etc., can be tested in this way. The relationships existing between pressures in 
the power boiler steam sequence are a good example. The pressure is highest 
in the boiler and decreases along the steam flow direction. 



> 

Process 

variables 



Test of quality relationships 
existing between variables 



> 

Diagnostic 

signal 



Fig. 3.8. Test of simple quality relationships existing between process variables 



The statement that the relationship is not fulfilled implies the occurrence of a 
fault symptom. Examples of the application of fault detection on the grounds 
of the relationships existing between process variable values were presented 
in (Koscielny, 1991; 2001; Koscielny and Pieniqzek, 1993; 1994). 

The relationships existing between process variable values have the ad- 
vantage that they can be defined on the grounds of knowledge possessed 
by production engineers, automatic control personnel and process operators. 
Detection algorithms are very simple. They do not require complex math- 
ematical models and are efficient during the detection of many faults. In 
practice, all catastrophic faults can be detected exclusively with the use of 
such simple relationships. However, some parametric faults may not be de- 
tected. For example, the test of the consistency of changes of the control 
signal U and the flow F in the three-tank system makes the detection of 
the blocked servomotor’s piston rod possible but does not detect parametric 
faults related to the erosion of the valve plug or the sedimentation of deposit 
that decrease the valve cross-section. 
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3.2.3. Methods of signal analysis and the testing of limits 

Detection in this group of methods is carried out on the grounds of the 
evaluation of particular process variable parameters. In the case of methods 
of signal analysis on the basis of process variable value changes, the value of a 
parameter (attribute) of this variable (e.g., the mean value, root-mean-square 
value, etc.) is calculated and evaluated. A diagram of such a test is presented 
in Fig. 3.9. 





Parameter 
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variable Parameters signal 

Fig. 3.9. Diagram of a diagnostic test with the 
control of the process variable parameter value 



In the simplest case, the diagnostic signal is calculated as a result of pro- 
cess variable value evaluation (Fig. 3.10). An example can be the testing of 
variable alarm limits. 
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Fig. 3.10. Diagram of a diagnostic test consisting 
in controlling the process variable value 



3.2. 3.1. Analysis of statistic signal parameters 

The analysis of changes of statistic signal parameters facilitates the detection 
of measuring line faults and, in some cases, also particular faults of actuators 
or the plant’s components. Most often, the analysis of the mean values or vari- 
ances of signals is applied. Fault detection consists in the current calculation 
of parameter values in a defined time window, and then, in comparing them 
with the nominal values calculated for the state of the complete efficiency of 
the system. 

The arithmetic mean of the sample can be calculated as follows: 

y = -'^Vi- (3.35) 

n ^ 

It defines the value around which the variable y fluctuates. The variation of 
the sample is the measure of the deviation of the process variable y from its 
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mean value according to the formula 

= (3.36) 

^ -L . - 
1=1 

In order to lower numerical errors, the following formulas (Niederlihski, 1977) 
are recommended for the calculation of the mean value and variance: 

1 ^ 

y + - (3.37) 

^ i=i 

= ^(2/i-2/i)^-^^(2/i-2/i)^ . (3.38) 

L^=l 2=1 J 

Calculations of these parameters are most often used for variables that char- 
acterise the quality of a product or a process (Niederlihski, 1977). As an 
example one can give the thickness of a rolled sheet, the thickness of paper in 
the paper-making machine, or the density of sugar beet juice at the output 
of the evaporation station in a sugar factory. The mean values and variances 
of control deviation can also be used for the evaluation of automatic control 
loop operation. Mean values and variance changes testify to faults, distur- 
bances or incorrect control of the system. The number of possible reasons is 
usually very high. Some events are detected after a long time, which results 
from system dynamics. The methods are therefore well suited for production 
quality control but they do not ensure quick detection of faults. 

Mean values and variance control is justified for the normal distributions 
of process variable probabilities, which is usually fulfilled in practice due to 
high number of random disturbances. 

3.2. 3. 2. Spectral analysis 

For spectral analysis purposes, time sequences of sampled signals having the 
period of Tp are applied. It is assumed that the sequence being observed 
is a realisation of a stationary discrete process that fulfils the hypothesis 
of ergodictness. The condition of stationariness means the expected value is 
constant and the autocorrelation function depends solely on the time-shift of 
the time sequence elements. On the other hand, the condition of ergodictness 
is fulfilled when the set-of-realisation-averaging of the stochastic process gives 
the same results as the time-averaging of one of possible realisations of the 
stochastic process. 

Stochastic time-sequences can be described as functions of time by the 
autocorrelation function: 

1 ^ 

Ryyim) = ^ 1 ^ 2N + 1 ^ 
i=—N 



(3.39) 
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By calculating the Fourier transform of the correlation function it is possible 
to obtain the function of spectral power density of the signal Syy according 
to the formula 

Syyicj) = Ryyim)e-^"^’>^. (3.40) 

In order to realise this transformation, the Fast Fourier Transform (FFT) 
algorithm is applied. 

In the state of the normal operation of a stationary system, particular 
process variables, being random signals, possess a certain shape both of the 
correlation function and the spectral power density Syy. The occurrence of 
faults results in defined changes of these characteristics, which makes the 
detection of faults possible. High practical meaning can be assigned to the 
vibro-acoustic diagnostics of machines with the use of the spectral analysis 
of signals. 

3. 2. 3. 3. Methods of limit checking 

The control of signal credibility, alarm values, signal speed changes, and tests 
of binary variable values can be assigned to this group of methods. The aim 
of controlling signal credibility is the detection of faults that can appear in 
measurement lines. The control consists in testing whether admissible values 
or speed changes of signals have not been exceeded. In some cases, a lack of 
signal changes is also detected. 

Algorithms for controlling the credibility limits of the variable y have 
the shape y < Tmax, and y > Imin- The lower and higher credibility limits 
ImiNj IlviAX are technically possible maximum and minimum values of the 
signal, respectively, in the state of efficiency. If no other premises are present, 
the limit is defined on the grounds of the knowledge of the measuring range 
of the applied measuring transducer as well as its class of accuracy. One 
usually assumes then the high and low limits of the range widened by an 
error resulting from the class of the measuring instrument as the limits of 
credibility. However, the credibility limits can be defined in many cases on 
the grounds of the knowledge of the system and measurement conditions as 
considerably narrower than the measurement range. 

In order to discern the zero value of a signal from faults such as a broken 
measurement line or a lack of power supply, current signals having standard 
values from the 4 to 20 milliamps range or voltage signals from the 1 to 
5 volts range are used in automatic control systems. Signals having the value 
of 4 milliamps or 1 volt denote the lower limit of the measured range and the 
signal value being close to zero testifies to a break of the line. Due to this 
fact, the range of the applied A-D transducers is sufficiently wider than the 
range of measuring transducer output signal changes. Exceeding the higher 
value of the limit can be caused, for example, by shortings. 

The admissible speed of the signal changes AFmax results from the 
dynamics of the diagnosed system. In theory, the calculation of this parameter 
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requires the knowledge of the system’s model and therefore is not applied. 
In practice, a simple algorithm for controlling the admissible speed of signal 
changes is used: 

Vk - Vk-i < AImax- (3.41) 

The admissible speed is usually defined on the grounds of operational ex- 
perience possessed by production engineers, automatic control personnel or 
operators. Changes of process variable values that are archi vised by the au- 
tomatic control system are also very helpful to define the speed. 

Some kinds of faults can manifest themselves by the lack of signal 
changes, i.e., changes lower than 5\ 

\yk-yk-n\<^, n = l,...,A^. (3.42) 

Credibility control algorithms are very simple and should be used as the 
norm in the line of the software conversion of each process variable value. 
They are able to detect catastrophic faults (a break, shorting, lack of power 
supply), but cannot detect parametric faults of measuring lines. 

The control of limits of analogue variable values (Niederlihski, 1977) is 
the simplest method of fault detection, used for a long time in conventional 
signalling- alarming systems, and later in computer-based automatic control 
systems. Absolute or relative limits related to the set value can be controlled. 

The control of absolute limits consists in detecting the exceeding of a 
lower (Tlo) or higher (Yhi) alarm limit by the process variable y : y < Tlo, 
and y > Ihi- The control algorithm is therefore the same as the algorithm 
for controlling credibility limits. The alarm limits are, however, much nar- 
rower than the credibility limits, and result not from technical possibilities 
of variable changes but from production limitations. In order to eliminate 
alarm flickering when the signal oscillates around the limit, a hysteresis zone 
is applied (Fig. 3.11). The lower alarm is withdrawn only after the variable 
has reached the value of (Ilo + AiJ), and the higher alarm is withdrawn 
after it has reached the value of (Iri — Ai7). 

As disadvantages of absolute limit control one can list, among other 
things, the long time of detection, which depends on the dynamic properties 
as well as on the system’s point of operation at the time of the occurrence 
of a fault. For example, the time of leak detection in a tank depends on the 
surface of the tank section and on the level of liquid in the tank before the 
occurrence of the fault. Moreover, control system operation tends to mask 
these symptoms. A small leak in the tank can be compensated by higher 
opening of the control valve, situated at the liquid inlet to the tank. It is 
possible that the lower alarm limit of the level will not be exceeded. Alarm 
limit control cannot be used for all process variables. For example, the flow 
signal from a transducer situated at the tank inlet can change its value within 
the whole range during the compensation of the effect of disturbances and 
faults on the controlled quantity (e.g., the level in the tank Z 3 ). Alarm limits 
equal in such a case signal credibility limits. 
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Fig. 3.11. Diagram of the control of alarm limits (absolute) 



The trend control algorithm is identical to the algorithm of the test of 
the admissible speed of signal changes: 



yif) - y{^ ^ Albop- (3.43) 

However, the aim is to detect too quick changes, inadmissible for production 
reasons. As an example it is possible to mention stress increase control in 
thick-walled elements of a power unit in order to ensure adequate safety and 
reliability. 

A simple detection method applied as the norm in automatic control 
systems is binary variable value control. Binary signals originate from limit 
switches, contactors, limit signalling devices, etc. They transmit information 
on the states of the operation of devices. Signal values are compared with a 
reference state Yat, which corresponds with operation without faults, y = 

The rule that the current should flow in the measuring circuit during the 
normal state of operation is applied. A lack of the current implies therefore 
both an incorrect state of the device as well as a lack of power supply in the 
measuring circuit. 

Fault symptoms are detected in the methods of limit control exclusively 
on the grounds of the evaluation of the values of one process variable. Detec- 
tion algorithms are very simple since they do not require knowledge that has 
the form of system models. Their disadvantages result from limited diagnostic 
information that is carried by a single signal as well as from the multiplic- 
ity and diversity of the meaning of the causes of signal parameter changes, 
which makes the definition of the relationships existing between symptoms 
and faults rather difficult. 
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3.3. Fault isolation 

Fault isolation is carried out on the basis of diagnostic signals generated by 
detection algorithms. The result of isolation is a diagnosis showing the faults 
or states of the system. The knowledge of the relationship that exists between 
the diagnostic signals and the faults or the technical states of the system is 
necessary for performing fault isolation. The ways of describing the relation- 
ship have been presented in Chapter 2. A completely reliable and unequivocal 
presentation of the existing faults or the definition of the diagnosed system 
state is not always possible due to incomplete and uncertain knowledge of the 
system, limited distinguishability of faults or states, uncertainty of diagnostic 
signals, etc. 

There exists a high variety of isolation methods. Isermann and Balle 
(1996) distinguish two basic groups: classification methods and automatic 
concluding methods. In (Koscielny, 2001), the way of obtaining knowl- 
edge on the relationship existing between diagnostic signals and faults was 
assumed as the main criterion of the classification method. The author 
distinguished: 

• methods in which the diagnostic relation results from the structure 
of mathematical models used for detecting faults (divided into two groups: 
methods without the modelling of the effect of faults and methods with the 
modelling of the effect of faults), 

• methods that require defining the symptoms-faults relation during the 
training phase, 

• methods based on an expert’s knowledge, 

• methods in which the diagnostic relation results from a redundant hard- 
ware structure. 

As regards describing the symptoms-faults relation, one can distinguish fault 
isolation methods using: 

• the JT-out-of-A'-type redundancy 

• logical functions, 

• ZF-TiiTEA-type rules, 

• different kinds of graphs, 

• the binary diagnostic matrix, 

• the information system, 

• fault-attributed regions (clusters) in the space of diagnostic signals, 

• the neural network, 

• the fuzzy neural network. 

The main conceptions used for fault isolation or the recognition of sys- 
tem states are described in brief below. The main emphasis is put on the 
presentation of inference methods. The problems of choosing an adequate 
set of detection algorithms in order to ensure, among other things, the re- 
quired distinguishability of faults (states) will be dealt with in the successive 
subchapters. 




80 



J.M. Ko^cielny 



3.3.1. Diagnosing based on the binary diagnostic matrix 

The binary diagnostic matrix (Chapter 2.4.1) presents the relation existing 
between the values of bi-state diagnostic signals and faults. It can be designed 
using system equations taking the effect of faults into account or on the basis 
of an expert’s knowledge. Diagnostic inference carried out by means of the 
binary diagnostic matrix can be realised with the use of classical or fuzzy 
logic (Koscielny, 2001). The latter approach allows taking diagnostic signal 
uncertainty into consideration. 

The binary diagnostic matrix is used for fault isolation together with 
different fault detection methods under the stipulation that the diagnostic 
signals being the outputs of detection algorithms have to be binary ones. 
The matrix allows the formulation of diagnoses about single faults. The rules 
of parallel and series inference with the use of classical logic are presented 
below. Fuzzy diagnostic inference is described in Chapter 11. 

3. 3. 1.1. Rules of parallel diagnostic inference on the assumption 
about single faults 

Parallel diagnostic inference consists in formulating a diagnosis as a result of 
the comparison of the obtained diagnostic signals with signatures of particular 
faults (Koscielny, 1994; 1995b; 2001). It is assumed that only single faults 
exist. 

Fault isolation is carried out using the set of diagnostic signals S. Infer- 
ence procedure consists in comparing the obtained diagnostic signals: 
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. ^(sj) . 







(3.44) 



with the signature of the state of the complete efficiency as well as the sig- 
natures of particular faults: 



V{fk) = 



Vlifk) 

V2{fk) 



(3.45) 



L '<^jifk) J 



where v{sj), Vj{fk) € {0, 1}. If all diagnostic signals equal zero, the diagnosis 
shows a lack of faults (the state of complete efficiency): = 0] => 

[DGN^zq. 

When fault symptoms (diagnostic signal values equal one) occur, the 
diagnosis shows a subset of faults whose signatures show consistency with 
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diagnostic signals: 



DGN ={fkGF: {V{fk) = F) } , (3.46) 

where V{fk) = V '^j-.si€s[vj{fk) = vj]. 

The faults shown in the diagnosis (3.46) are indistinguishable with the 
given subset of diagnostic signals. 

The rule of parallel diagnostic inference is illustrated in Table 3.1. Notice 
that this kind of inference can be assigned to methods of picture recognition. 



Table 3.1. Parallel diagnostic inference - the comparison of actual 
diagnostic signals V with the signatures of faults V(fk) 



Pattern signals 



S/F 


fi 


/2 




fk 




Ik 


Si 














S2 




































Vjifk) 




















SJ 















Vifi) V{fk) V{fK) 



Real signals 






V 



3. 3. 1.2. Rules of series diagnostic inference on the assumption 
about single faults 

Series diagnostic inference consists in analysing subsequent diagnostic signal 
values and formulating the diagnosis step by step, each time narrowing the set 
of possible faults (Koscielny, 1991; 1995a; 2001). The fault isolation process 
begins after the first symptom has been observed. The symptom appearance 
implies the existence of one of the faults that a given diagnostic signal is 
sensitive to. This subset of possible faults is shown in the primary diagnosis 
formulated on the basis of the first observed symptom: 

DGN^=F{v{s^) = l). (3.47) 

Particular diagnostic signal values are analysed one by one. Depending on the 
result of this analysis, the sets of possible faults are appropriately reduced. 

If the diagnostic signal value equals zero, it means that there is no fault 
controlled by this signal, i.e., no fault that belongs to the set F{sj): 



Sj = 0^ V z{fk) == 0. 

k:fkeF(sj) 



(3.48) 
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If the diagnostic signal value equals one, it means that there exists a 
fault belonging to the set F{sj): 

S^- = 1 3 z{fk) = 1. (3.49) 

k:fkeF(sj) 

If we assume that only single faults exist, the following rules reducing of the 
set of possible faults indicated in successive steps of the formulation of the 
diagnosis are applied. 

If the diagnostic signal value equals zero (a positive result of the test), 
it causes the reduction of the set of possible faults by the faults detected by 
this signal: 



Sj = 0^ DGNp = DGNp^i - DGNp^i n F{sj). (3.50) 

If the diagnostic signal value equals one (a negative result of the test), 
it means that the new set of possible faults is a product of the previous set 
of possible faults and the set of faults F{sj) detected by the signal: 

Sj = l^ DGNp = DGNp-i n F{sj). (3.51) 

Serial inference consists in formulating an initial diagnosis after the first 
symptom is observed, and specifying the diagnosis as a result of the analysis 
of successive diagnostic signals. In order to formulate the final diagnosis, the 
analysis of all diagnostic signals is most often not necessary (Koscielny, 1991; 
2001). The order of diagnostic signal interpretation can be unchanged or can 
depend on various factors. It is useful, for example, to condition the order 
of choosing subsequent diagnostic signals on the set of faults indicated in 
the diagnosis that has been formulated in the preceding step. The order of 
the analysis of diagnostic signals can also depend on the diagnosed system’s 
dynamics. 

3. 3. 1.3. Inference with the inconsistency of symptoms 

During diagnostic inference, combinations of diagnostic signals inconsistent 
with fault signatures (states of the system) can appear. They result from, 
among other things, incorrect values of diagnostic signals caused by measure- 
ment noise, model inaccuracy or changes of system parameters. The simplest 
index of such inconsistency is the number of diagnostic signals showing test 
results but having different values, and the number of pattern signals defined 
by a given fault signature: 

^i= (/*)]’ (3-52) 

r-sj€S 

where 0 is the modulo two operation. 
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The DGN{N)-tjpe diagnosis shows faults for which the value of the 
coefficient N is lowest: 

DGN{N) = :Nk= min fiV,}) . (3.53) 

( k:fkeF j 

The above index can be used during parallel diagnostic inference. During 
series diagnostic inference, it is useful to interpret diagnostic signals of the 
highest certainty at the very beginning. However, these problems will not 
be developed since the natural way of taking symptom uncertainties into 
account is the application of the Bayes theory or the fuzzy evaluation of 
residual values (Chapter 11). 

3. 3. 1.4. System states with multiple faults 

It is possible to define a set of possible faults F understood as destruction 
events that worsen the quality of the system (or the system’s element) oper- 
ation for each diagnosed system: 

F = (3.54) 

To each element fk belonging to the set of faults F, one can attribute 
the state z{fk) defined as follows: 

{ 0 the fault fk did not occur, 

(3.55) 

1 otherwise. 

According to the equation (1.28), the technical state of the system z{t) is 
a function of faults. Let us assume here that the system state is defined by 
the set of the existing faults. Such an approach corresponds well with the 
specificity of fault isolation, especially for multiple faults. 

According to the above reasoning, let us assume that for the needs of 
fault isolation, the state of the diagnosed system is defined by the states of 
all of its faults belonging to the set F: 

Z = {zifi),z{f2),---,z{fK)} ■ (3.56) 

The set Z of all states Z{ of the diagnosed system, 

Z = {zi:i = 0,l,...,I}, (3.57) 

can be expressed as the sum of the subsets of states having the number of 
faults m from 0 to K: 

K 

m=0 



(3.58) 
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where Zm = {zi ^ Z \ Ylfk=i^Uk) — is the subset of system states in 
which m faults appear simultaneously. 

The state Zi of the diagnosed system can be unequivocally described by 
the set F{l)i of faults occurring in this state. Therefore, to each state it 
is possible to attribute the subset F{l)i defined as follows: 



F{l)i = {fkeF:z{fk)i = l}, (3.59) 

where z{fk)i denotes the state of the fault fk in the state Zi of the system. 

For example, if the set of faults of a system contains three faults {K — 3), 
F = {/i, / 2 , /s}, the system can be in one of the following eight states: 

Zo = {z(fi) = 0, z{f2) = 0, z{fs) = 0} = {0,0,0}, 

^1 = { 1 , 0 , 0 }, ^2 = { 0 , 1 , 0 }, 2;3 = { 0 , 0 , 1 }, 

Z4 = {1,1,0}, Z5 = {0,1,1}, Z6 = {1,0,1}, ^7 = {1,1,1}- 

The set of faults existing in any given state corresponds to this state according 
to the equation (3.59): 

F(l)o = 0, F(l)i = {/i}, F(1)2 = {/ 2 }, F(1)3 = {/3}, 

F{1)4 = {/l,/2}, F(1)5 = {/2,/3}, -F(1)6 = {/l,/3}, F(l), - {/l,/2,/3}. 

The set of all states contains the state of the complete efficiency of the 
system (without faults) as well as the subsets of states with single, double 
and all three faults: 



^(0) — ^(1) — {^l5 ^2, Z(2) — {^{ 4 , ^5, Ze}, — Z 7 . 

The binary diagnostic matrix describes the pattern values of diagnostic 
signals with states having single faults. The table describing the pattern 
values of diagnostic signals in all states of the system (Table 3.2) or in a 
defined subset of these signals is called the table of states. A complete table 
of states contains therefore the values of diagnostic signals in the state of 
complete efficiency, in states with single faults, in states with double faults, 
etc., and in the state with all possible faults of the system. 

In the state of complete efficiency zq, all diagnostic signals should have 
positive values. In states with single faults signatures are identical with 
signatures of particular faults. In states with multiple faults, defining pattern 
signals in the general case is not simple. An effect of two or more faults on 
the operation of the system, i.e., on diagnostic signal values can be different. 
The symptoms can be: strengthened, the same as in the case of one of the 
faults, weakened, or they may not appear at all in the particular case when 
the effects of different faults compensate. 
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Table 3.2. Table of states of the diagnosed system 
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3. 3. 1.5. Parallel inference on the assumption about multiple faults 

The assumption that only single faults exist is not always justified. In such 
a case, one should take states with multiple faults into account in diagnostic 
inference. It is usually sufficient to widen the set of the analysed states of the 
system with the states with double, and possibly with triple faults. If diag- 
nostic inference would be carried out taking multiple faults into account, it is 
necessary to know their signatures, shown in the table of states (Table 3.2). 

The state of the system is unequivocally defined by the set of faults 
F{l)i existing in this state (3.59). Signatures for states with multiple faults 
are created using the binary diagnostic matrix (Gertler, 1998; Koscielny, 1991; 
1995a). The pattern value of the diagnostic signal sj in the state Zi is defined 
as an alternative of this signal’s value for all faults existing in this state: 

Vj{zi) =v{F{l)i) = IJ Vjifk). (3.60) 

k-.fik)€F{l)i 

In order to define pattern signals, one can also use the subsets of faults 
F{sj) detected by particular diagnostic signals: 



Vj{zi) (F{sj) n F (1). = 0), 

vj{zi) = l^{F{sj) n F(1),^0). 



(3.61) 



Defining the signatures in such a way is based on the assumption that 
each residual sensitive to a single fault that belongs to the subset F{sj) is also 
sensitive to multiple faults that belong to the same subset. The assumption 
is not always fulfilled in practice since the influences of faults on the residual 
value may be strengthened or compensated. However, the suggested approach 
is the only rational way to define pattern signals in states with multiple faults. 
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The signature V{zi) of the i-th state of the system is represented by 
the i-th column in the table of states: 



V{zi) 



Vi {Zi) 

V2{Zi) 



(3.62) 



L Vj{Zi) 



If the subsets of the pattern values of diagnostic signals are known in all 
states Zi ^ Z of the system and the values of the diagnostic signals V have 
been obtained as a result of fault detection, it is possible to formulate the 
diagnosis. The diagnosis is understood as a credible hypothesis that refers to 
the state of the diagnosed system. The diagnosis shows the subset of system 
states that are indistinguishable with the given subset of diagnostic signals 
5, for which the pattern values adjust to the values obtained during the 
realisation of detection algorithms: 

DGN{Z) = {ziGZ: (V(zi) = F) } , (3.63) 

where V(zi) = V^ '^J:s,-6slvj(zi) = vj]. 

Instead of showing the numbers of system states, it is better to give in 
the diagnosis subsets of faults that correspond to particular indistinguishable 
states of the system: 

DGN{F) = {F{l)i : {V{zi) ^V)}. (3.64) 

It is sufficient to define in practice a diagnosis that contains not all states 
for which the pattern values of diagnostic signals are consistent with real ones 
but only those states that contain a minimum number of faults. The diagnosis 
DGN{F) in such a case has the following form: 

DGN{FY = : |F(1),| = min {|F(1),|}1 . (3.65) 

I i:S=Si J 

It is necessary to stress that such a diagnosis shows the most probable 
faults of the system since the probability of states decreases rapidly with the 
number of faults. However, this way of inference can be deceptive because 
some signatures having multiple (e.g., double) faults can be the same as 
signatures of other single faults. Such states are indistinguishable. 

A vital problem in the case of diagnosing complex systems having a high 
number of faults is the necessity to limit the number of the analysed states 
of the system. It is possible to obtain this in the following ways (Koscielny, 
1991; 2001): 

a) an adequate decomposition of the system into subsystems and decen- 
tralised diagnosing; 
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b) an adequate definition of the subset of possible states of the system 
on the basis of the observed symptoms and a subsequent comparison of the 
signatures of these states only with the set of the obtained values of diagnostic 
signals. The set of the analysed diagnostic signals can be additionally limited 
by taking into account only those signals which are necessary to recognise 
any given damage situation. Therefore, the subsets of the table of states are 
dynamically analysed. 



3. 3. 1.6. Series inference on the assumption about multiple faults 

The rules of series diagnostic inference on the assumption of the existence 
of multiple faults are similar as in the case of single faults. The difference 
consists in the fact that the set of possible states of the system is reduced 
as a result of the analysis of successive diagnostic signals and not the set of 
possible faults. At the beginning, the set contains all system states. 

If the diagnostic signal value equals zero, it causes a reduction of the 
set of possible states of the system by states whose signatures contain the 
pattern value of the diagnostic signal that equals one: 

Sj = 0 ^ DGN{Z)p — {zi : vj{zi) = l}- (3.66) 

The detection of a symptom means that the new set of possible states is 
reduced by states whose signatures contain the pattern value of the diagnostic 
signal which equals zero: 

Sj = 1 ^ DGN{Z)p — {zi : Vj{zi) = O}. (3.67) 

The process of inference ends after the analysis of all diagnostic signals 
has been carried out or after the lack of the possibility of further distin- 
guishing of system states has been ascertained. Such an algorithm is given 
by Koscielny (1991; 1995a). 



3.3.2. Diagnosing based on the information system 

The information system is a generalisation of the binary diagnostic matrix. It 
allows applying a multi- value evaluation of residuals, carried out individually 
for each diagnostic signal. The information system can be defined not only 
for faults but also for system states. In such a case it is the expansion of not 
the binary diagnostic matrix but the table of states. 

The following methods of parallel and series inference for single faults are 
a generalisation of a similar algorithm formulated for the binary diagnostic 
matrix. 
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3. 3. 2.1. Parallel diagnostic inference based on the information system 

The general shape of a diagnosis in the information system on the assumption 
about single faults can be described by the following formula: 

DGN=\fkeF: V vjerih^sj)]. (3.68) 

The formula defines simultaneously the rules of diagnostic inference on the 
assumption about single faults. The diagnosis shows faults whose signatures 
are consistent with the obtained values of diagnostic signals. The consistency 
means that the value of each diagnostic signal belongs to the subset of pattern 
values Vjk = r{fk,Sj) defined in the Fault Information System (FIS) (see 
Chapter 2. 4. 2.1). 

3. 3. 2. 2. Series diagnostic inference based on the information system 

Series diagnostic inference on the assumption about single faults is carried 
out similarly as in the case of the binary diagnostic matrix. The fault isolation 
process begins after the first symptom Sx ^ 0 has been observed. The first 
diagnosis contains all faults detected by this symptom: 



DGNi — {fk E F : Sx ^ Vkx}- (3.69) 

The values of particular diagnostic signals are analysed one by one. Ac- 
cording to them, the sets of possible faults are reduced in an appropriate way. 
If the diagnostic signal value equals zero, it means that no fault detected by 
the signal has appeared: 

sj=0^ V z{fk)=0, (3.70) 

k:fkEF{sj) 

where F{sj) — {fk : Vkj ^ 0} denotes the set of faults detected by the j-th 
diagnostic signal. 

If the diagnostic signal value is different than zero, it testifies to the 
existence of a fault or a subset of faults belonging to the set F{sj): 

3 z{fk) = l. (3.71) 

k:fkeF{sj) 

If one assumes the existence of single faults, the following rules of reducing 
the set of possible faults shown in successive steps of the formulation of the 
diagnosis are applied: 

If the diagnostic signal value equals zero (a positive result of the test, 
Sj = 0), it causes a reduction of the set of possible faults by faults detected 
by this test: 



sj=0^ DGNp DGNp^i - DGNp^i n F{sj). 



(3.72) 
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If the diagnostic signal value is different than zero (a negative result of 
the test, Sj 7^ 0), it means that the new set of possible faults is the product 
of the previous set of possible faults and the set of faults F{sj) detected by 
a given signal sj : 

Sj^O^ DGNp = DGNp-i n F{sj). (3.73) 

The basic advantage of serial inference is the possibility of giving the 
current diagnosis at every moment of the diagnosis. In order to obtain the final 
diagnosis, the interpretation of all diagnostic signals usually is not necessary, 
so it is possible to obtain shorter times of diagnosis elaboration than in the 
case of parallel inference. 

3.3.3. Methods of pattern recognition 

The possibility of applying pattern recognition methods in diagnostics results 
from the assumption that a certain class of system patterns being in a de- 
fined technical state are closer to each other than system patterns in other 
states despite measurement errors, different random factors, etc. The pat- 
tern recognition is a process consisting in defining the look and classification 
of the pattern on the grounds of characteristic features. Pattern recognition 
methods are described in (Andrews, 1972; James, 1988), and their applica- 
tion to the diagnostics of technical systems can be found in (Himmelblau, 
1978; Korbicz, 1998; Pau, 1981). 

Both classical and neural classification methods are used in diagnostics. 
Among the classical ones, one can distinguish geometrical, polynomial and 
statistical classifiers. Classification can be precise or approximate (Cholewa 
and Kicihski, 1997). The precise classification is based on criterions that apply 
the measures of similarity or distance. During the approximate classification, 
the criterions of similarity to the pattern or degrees of affiliation to the pattern 
region are applied. Approximate classification algorithms are based on the 
theory of fuzzy or rough sets (Pawlak, 1991). 

The classical diagnostic diagram constructed on the basis of pattern 
recognition is presented in Fig. 1.9. It contains the block of the extraction of 
features (diagnostic signals) as well as the block of the classification of system 
states (faults). Mapping the process variable space V into the diagnostic 
signal space S is realised at the stage of diagnostic signal extraction. Mapping 
the diagnostic signal space into the fault space F or the system state space Z 
is realised in the classification phase. In order to carry out the classification, 
reference patterns for all distinguished states or a class of the states of the 
system are necessary. 

Solutions of this type have limited usefulness due to the static character 
of mapping realised by the network. Moreover, the mapping cannot model 
dynamic properties of the system in many states with faults. Because of 
that, diagrams are applied in which fault detection is realised with the use of 
system models. The values of residuals (Fig. 3.12) are supplied to the classifier 
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Fig. 3.12. Diagram of the diagnostic system 
with the use of the estimation of outputs 

input in this case. Detection can be realised with the use of different kinds of 
models, e.g., neural ones for the state of operation without faults. 

Another conception of the application of neural networks consists in 
using the set of networks that recognise particular states of the system. Neural 
models of the system are created for the state of complete efficiency as well 
as for states with particular faults. The above-mentioned concept has been 
applied, for example, to the diagnostics of a diesel engine (Ayoubi, 1994), and 
to fault diagnostics in a two-tank system (Korbicz et al, 1999). 

In order to carry out the classification of system states, pattern pictures 
are necessary for all distinguished states of the system. They are defined 
during the training process. In order to implement the classification, training 
data for all system states that are to be recognised are necessary. This creates 
the main difficulty in the application of the pattern recognition method to 
fault isolation in industrial processes (systems). 

Collecting measurement data for all states of the system is usually im- 
possible in the case of industrial processes, for which particular damage states 
appear rarely but are dangerous from the point of view of safety and cause 
considerable economic losses. Moreover, technical installations in the chemi- 
cal, power and food industries are most often unique ones or are manufactured 
in a short series. They often undergo construction changes. The number of 
possible faults is very high and particular faults appear extremely rarely. 
All of this makes the possibility of obtaining training data sequences that 
represent particular damage states exceedingly low. On the other hand, the 
diagnostic system should detect and recognise dangerous damages that have 
never appeared before. 

If the analytical model of the system is known, it is possible to simulate 
different states of the system and to define training data in such a way. A 
neural model tuned by means of a very complicated analytical model can 
be more useful for application in the current diagnostics of the system than 
the analytical model. However, if analytical models can be directly applied 
to fault detection in real time, the application of their neural equivalents is 
much less reasonable. 
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If the system should only carry out the diagnostics of measurement 
lines, training data sequences can be obtained by an artificial introduction of 
changes of particular process variable values into the data obtained for the 
normal state. The changes simulate faults of particular measurement lines. 
This method, however, cannot be applied to process variables used in au- 
tomatic control systems since real faults of such measurement lines cause 
changes not only of these signal values but also of control signals, which has 
further consequences for system operation. 

In the case of devices that are mass products (e.g., motors, control valves, 
pumps, etc.), obtaining measurement data that characterise states with par- 
ticular faults is possible, especially in the case of experiments conducted by 
the manufacturers of these devices. 



3.3.4. Recognition of directions in the space of residuals 

In the case of fault detection carried out with the use of system models, the 
set of residuals is generated. The space whose all elements are residuals is 
called the residual space, or the parity space (Patton and Chen, 1991). There 
exist two conceptions of fault isolation on the basis of the set of generated 
residuals: 

a) Directional residuals. In order to ensure the possibility of isolating 
faults, the set of residuals is designed in such a way that the occurrence of 
particular faults characterises the specific (unique for each fault) place of 
residuals in the parity space. One can therefore assign to each of the faults 
an individually designed directional vector (Gertler, 1998; Patton and Chen, 
1991; Potter and Suman, 1977). This is illustrated in Fig. 3.13. 




Fig. 3.13. Directional residuals 
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b) Structured residuals. It is assumed that particular residual values are 
evaluated in the binary way (0 - positive result, 1 - negative result). A binary 
residual (diagnostic signal) equals one if there occurs a fault that the residual 
is sensitive to. The sensitivity is written down in the binary diagnostic matrix, 
which is defined on the Cartesian product of the set of binary residuals and 
faults. If the residual is sensitive to a fault, an appropriate element of the 
matrix becomes equal to one, otherwise it is equal to zero (zeroes in the binary 
diagnostic matrix will be omitted) . In order to obtain fault distinguishability, 
each of the faults should cause the occurrence of a set of the binary values 
of residuals (diagnostic signals) that is different from all of the other ones 
(Gertler, 1998; Gertler and Singer, 1990). This is illustrated in Fig. 3.14. 




Fig. 3.14. Structured residuals 

The fault isolation rule in the case of structured residuals consists in inference 
on the basis of the binary diagnostic matrix described in Chapter 3.3.1. The 
concept of directional residuals will be discussed with the help of a simple 
example of the hardware redundancy of measurement lines. 

Figure 3.15 presents a diagram of the hardware redundancy of measure- 
ment lines. The same physical quantity is measured by three measurement 
transducers. The output signals yi depend on the measured quantity (state 
co-ordinate) x as well as on the occurring faults fi\ 

2/i=^ + /i, V2=x + f2, y3=x + fs. (3.74) 

When no faults exist, the residual values should equal zero. Let us consider the 
following variant of the set of residuals used for fault detection and isolation: 

f ri=yi-y2 = fi- /2, ^2 = 2/2 - 2/3 = /2 - /s, 

1 rs =ys -yi = fs - fi- 



(3.75) 
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Fig. 3.15. Diagram of measurement lines redundancy 



These residuals result directly from the comparison of particular pairs 
of the input signals. The equations (3.75) contain both the calculation shape 
(the dependence of the residuals on the output signals) as well as the in- 
ner shape (the dependence of the residuals on the faults). The above set of 
residuals in the matrix notation has the form 



r Vy = 



1 -1 0 




yi 




1 -1 0 




'/i ' 


0 1 -1 




V2 


- 


0 1-1 




/2 


1 0 1 




_ . 




-1 0 1 




/3 _ 



(3.76) 



Let us interpret the equations (3.76) as directional residuals. To each of the 
faults, there corresponds a direction in the space of residuals that is defined 
by an appropriate column of the matrix V. For instance, a direction defined 
by the first column of this matrix corresponds to the fault /i. Figure 3.16 
presents directions in the parity space that correspond to the faults of par- 
ticular measurement lines. 




Fig. 3.16. Directional residuals for the set of residuals (3.75) 
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Let US notice that the occurrence of each fault causes the occurrence of a 
different than zero value of the vector of residuals in a direction that is specific 
for the fault. Therefore, one can isolate the appearing faults on the basis of the 
analysis of the residual vector dimension. If one wants to make a comparison, 
the binary diagnostic matrix for the set of residuals (3.75) has the following 
shape (Table 3.3): 



Table 3.3. Binary diagnostic matrix for the residuals (3.75) 
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During the design of directional residuals, the knowledge of the dynamics 
of the effect of faults on the system outputs is necessary. If the diagnosis is 
carried out with the use of the linear input-output-type models, not only the 
computational form must be known: 

r(s) = y(s) - G(s)u(s). (3.77) 

but also the internal form: 

r(s) = GF(s)f(s), (3.78) 

which defines the dependence of the residuals on the faults. 

To design directional residuals having primary residuals described by the 
equation (3.78), one should define the direction j3^ in the space of residuals 
for each one of k faults. The direction should be specific for this fault. The 
task is reduced to defining the transmittance matrix V (s) in such a way that 
secondary residuals 



r*{s) = V{s)r{s) = T(s)C?f(s)/(s), (3.79) 

are such that the appearance of a A:-th fault causes the residual vector r* 
to change in the direction /3^. 

Fault isolation consists therefore in comparing the residual vector di- 
rection with pattern directions that characterise particular faults (Gertler, 
1998). To design the residual vector pattern directions, the knowledge of the 
dynamics of the effect of particular faults on the system outputs is necessary. 
Such data can be easily obtained for the faults of measurement lines. Obtain- 
ing adequate transmittances for the faults of system elements and actuators 
requires modelling of the system taking into account these faults, which is 
very difficult and sometimes outright impossible for many industrial systems. 
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3.3.5. Other methods 

Many other fault isolation methods are applied beside the ones presented 
above. It is possible to distinguish: 

• Methods of inference in diagnostic expert systems (Chapters 15 
and 16). 

• Diagnostic graphs. One can single out Signed Directed Graphs (SDG) 
(Iri et al, 1980; Montmain and Leyval, 19941; Shiozaki et al, 1985), 
AND /OR/NOT graphs (Lig^za and Fuster-Parra, 1997; see Chapter 16), 
and bond graphs. Fault isolation in these methods consists in defining graph 
paths, which can explain an incorrect operation of the system. Bond graphs 
form a method of physical system modelling that was described in detail in 
(Thoma and Bouamama, 2000). They describe transformation, energy stor- 
age and dissipation within the system, and present these processes in the 
form of a graph that connects process variables. They result from physical 
equations that describe the system. Diagnostics with the application of bond 
graphs was used by Mosterman et al. (1995). 

• Diagnostic inference on the basis of inconsistency. This group of meth- 
ods was initiated by Reiter (1987). Diagnoses are formulated in this approach 
on the basis of the analysis of inconsistencies detected in the operation of the 
system. Inconsistencies are equivalents of the binary evaluation of residual 
values. The diagnosed system’s model defined for the normal state (complete 
efficiency) is used. Inconsistencies are detected as a result of the comparison 
of the system operation and the model on the basis of measurement data. The 
detected inconsistencies are the basis for generating the set of conflicts un- 
derstood as system element sets that contain a faulty element. The diagnosis 
indicates possible combinations of faulty elements. The method is described 
in Chapter 16. 

• Statistical methods, among which the method of principal component 
analysis, PGA (Chiang et al, 2001; Russell et al, 2001), is the most popular 
one. 

• Application of stochastic automatons to the diagnostics of dynamical 
systems (Lunze, 2000). 



3.4. Fault distinguishability 

The possibility of fault distinguishability is vital during the design of a diag- 
nostic system. One usually aims to obtain the distinguishability of all faults. 
The accuracy of the obtained diagnoses is defined by the number of faults 
indicated in the diagnosis (Koscielny, 1991; 2001). The lower the number, the 
more accurate the diagnosis. Therefore, an increase in fault distinguishabil- 
ity leads to an increase in diagnostic accuracy. The problems related to fault 
distinguishability for the binary diagnostic matrix, the information system 
and pattern pictures in the space of diagnostic signals are discussed below. 
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3.4.1. Fault distinguishability based on the binary diagnostic matrix 

When one knows the diagnostic matrix, it is easy to verify whether the de- 
tection of all faults fk^Fis possible. If a diagnostic signal sensitive to a 
particular fault exists for all faults: 

\/ifkeF)3[vjih) = l], (3.80) 

then the set of diagnostic signals is sufficient for the needs of fault detection. 
If a column of the binary diagnostic matrix contains only zeroes, then the 
fault corresponding to this column is not detected. 

Two faults, fk and fm, are indistinguishable (remain in the relation 
Rnf) on the basis of the binary diagnostic matrix if and only if their signa- 
tures (matrix columns) are identical: 

fk^NFfm ^ [/fcj/m ^ F~^ A V — '^j(/m)]- (3.81) 

Sj^S 

The relation Rnf is also called the relation of fault inseparable control. It 
divides the set of faults F into the subsets of indistinguishable faults. This 
problem was discussed in detail by Koscielny (1991). 

The distinguishability of all faults appears only when all columns of the 
binary diagnostic matrix are different: 

V 3 [^jifk) 7^'^j(/m)]- (3.82) 

fkJmeF Sj 

k^m 



Such distinguishability is defined by Gertler (1991; 1998) as weakly isolating. 
He defined also strongly isolating distinguishability. The strongly isolating 
distinguishability (of the 1-st degree) occurs when the signatures of any of 
two faults differ at least at two positions. A generalisation of this definition 
is the strongly isolating distinguishability of the fc-th degree, in which any of 
two columns of the table must differ at least at {k -f 1) positions. 

Table 3.4 presents examples of binary diagnostic matrices fulfilling the 
condition of a strongly isolating distinguishability. 

Binary matrices in which the number of zeroes (ones) in each of the 
columns (rows) is identical but their distribution is different are called 
column-canonical (row-canonical). The first and the second matrices are 
column- and row-canonical ones. 

A special example of a canonical matrix is the unitary matrix shown in 
Table 3.5. It contains ones at its diagonal. The number of tests equals the 
number of faults J = K. 

A characteristic feature of the matrix is that each test (diagnostic signal) 
detects one fault only, different than those detected by other tests. Therefore 
the equations (2.50) and (2.51) can be simplified to the form F{sj) = fj, 
S{fk) = sk- 
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Table 3.4. Binary diagnostic matrices that ensure the 
strongly isolating distinguishability (of the 1-st degree) 




Table 3.5. Unitary diagnostic matrix 



S/F 


h 


h 


h 


h 


Si 


1 








S2 




1 






S3 






1 




S4 








1 



Such a diagnostic matrix is the optimal solution. It ensures the strongly 
isolating distinguishability of faults. The negative test result indicates directly 
the specific fault: 

= 1 ^ ^{fj) — 1- (3.83) 

The subset of negative results defines unequivocally the subset of the existing 
faults. 

In practice, such a matrix is impossible to realise. For each one of the 
faults, there should exist a sensor detecting the fault. However, it is techni- 
cally impossible to build diagnostic tests sensitive to one fault only. Moreover, 
widening the set of measurement devices by m elements means also widen- 
ing the set of faults by the same number of elements, since the faults of 
measurement lines are within the set of all faults. 

3.4.2. Distinguishability of system states based 
on the binary table of states 

The table of states (Table 3.2) defines diagnostic signal pattern values in 
all states of the system or in a defined subset of the states. Signal values 
contained in the column of the table of states are the signatures of the system 
state. 
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Similarly to fault distinguishability, one can define the distinguishability 
of the states of the diagnosed system (Koscielny, 1991; 2001). Two states zi 
and Zn are indistinguishable (remain in the relation Rnz) with the given 
binary diagnostic matrix if and only if the columns of the table of states are 
identical: 

ZiRNZZn [Zi, Zn£ Z]A V [vj(Zi) = Vj(Zn)] . (3.84) 

Sjes 

Let US notice that the unitary diagnostic matrix leads to the distin- 
guishability of all states of the system. In such a case, the system state is 
unequivocally defined by the results of all tests: 

^ = {z{fi),z{f2),...,z(fK)} = {t'(si),t)(s2),...,'y(siv:)}. (3.85) 

In general, obtaining the distinguishability of all states of the system is im- 
possible. 

If the existence of several simultaneous faults (multiple faults) is possible 
during diagnosing, it is necessary to consider the distinguishability of system 
states. Assuming that only the state of complete efficiency as well as states 
with single faults can exist, the analysis of the distinguishability of system 
states consists in investigating fault distinguishability. 

3.4.3. Fault distinguishability based on the information system 

Let us define the notions of the unconditional and conditional indistinguisha- 
bility of faults in the FIS. The faults fk,fm ^ F are unconditionally indis- 
tinguishable in the FIS with respect to the diagnostic signal sj G 5 if and 
only if the value of the function r (2.76) fulfils the condition 

r{fk,Sj) =r{fm,sj), (3.86) 

i.e., the subset of the values of the function r for the j-th. diagnostic signal 
and the faults, fk and fm is identical, Vkj = Vmj- 

The faults fk,fm ^ F are conditionally indistinguishable in the FIS 
with respect to the diagnostic signal sj G 5 if and only if the value of the 
function r (2.76) fulfils the condition 

f{fk,Sj) n r{fm, Sj) 7 ^ 0 Vkj n Vmj # 0. (3.87) 

The conditional indistinguishability of faults with respect to the signal sj 
means that two given faults are indistinguishable for certain values of this 
signal that fulfil the condition vj G Vkj fl Vmj^ However, for other values of 
the diagnostic signal sj : vj ^ Vkj the same faults are distinguishable. 

The faults fk,fm ^ F are indistinguishable (unconditionally indistin- 
guishable) in the FIS with respect to every diagnostic signal sj G 5 if and 
only if 

Sjes 

which can be written as fk^Nfm- 



(3.88) 
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The definition of unconditional indistinguishability can also be presented 
in the following form: 



fkRNfm^V{h)=V{fm). (3.89) 

The above definition is identical with the definition of indistinguisha- 
bility introduced by Pawlak (1983). With the two- value evaluation of test 
results, it is an equivalent of the definition of fault indistinguishability ap- 
plied in (Koscielny, 1991; 1993), which uses the notion of quotient sets. Fault 
signatures that are unconditionally indistinguishable are identical. The defi- 
nition of conditional indistinguishability in the FIS is introduced below. 

The faults fk^fm ^ F are conditionally indistinguishable in the FIS 
with respect to all diagnostic signals Sj G 5 if and only if the subsets of 
their values that correspond to the faults fk and fm have a common part 
for every signal, and these faults are not unconditionally indistinguishable: 

V r{fk,Sj)nr{fm,Sj) A 3 r{fkSj)j^r{fm,Sj), (3.90) 

sjes sjes 

which can be written as fkRwNfm- The same definition can be expressed 
by the following notation: 

fkRwNfm ^ V Vkj n Vmj 7 ^ 0 A 3 Vfcj 7 ^ Vmj’ (3.91) 

Sj^S Sj€S 

The conditional indistinguishability of faults means that there may ap- 
pear values of diagnostic signals that fulfil the condition V Vj G Vkj fl Vmj , 

Sj es 

for which the two given faults are indistinguishable. However, other diagnos- 
tic signal values for which the same two faults are distinguishable are possible. 
The following condition is then fulfilled: 

3 {vj G Vkj) A {vj ^ Vmj) V {vj ^ Vkj) A {vj G Vmj)- (3.92) 

Sj£S 

Any two faults are distinguishable in the FIS if there exists a diagnostic 
signal for which the subsets of values that correspond with these faults are 
separable: 

3 r{fk,Sj) Ar{fm,Sj) = <1). (3.93) 

Sj G-S 

In the information system shown in Table 3.6, the faults fi and fs 
are unconditionally indistinguishable, the faults /2 and are conditionally 
indistinguishable, and the fault /s is unconditionally distinguishable from all 
of the others. The relation of the unconditional indistinguishability of faults 
(which is the relation of equivalence) divides the set of faults into elementary 
blocks that contain the subsets of indistinguishable faults. If the set of faults is 
replaced in the information system FIS by the set of elementary blocks, such 
a system is called (according to Pawlak’s definition) the FIS representation, 
and will be denoted by the FIS* symbol: FIS* = {E^S^Vs^r*). 
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Table 3.6. Example of the information system 



S/F 


h 


/2 


h 


fA 


h 


Vj 


Si 


0 


I 


0 


I 


I 


{0,1} 


S2 


-1,+1 


-i,+i 


-1,+1 


+I 


0 


{0, —I, -1-1} 


S3 


I 


0 


I 


0 


I 


{0,1} 



In the case of the multi- value classification of diagnostic signals, an un- 
equivocal definition of indistinguishability is thus not possible. It depends 
on the combinations of the signal values. For some of the combinations, the 
obtained distinguishability is higher, and for others it is lower. Elementary 
blocks defined by the relation of unconditional indistinguishability define the 
highest distinguishability. The lowest distinguishability is defined by the re- 
lation of conditional indistinguishability. 



3.4.4. Fault distinguishability based on pattern recognition 
in the space of diagnostic signals 

In the case of fault isolation with the use of pattern recognition methods, the 
defined regions (pattern pictures) in the space of diagnostic signals correspond 
with particular faults or states of the system. It is possible to formulate the 
following general conditions concerning the distinguishability of faults (states 
of the system): 

• If the pattern pictures of two faults (states) are identical in the diag- 
nostic signal space, then these faults are unconditionally indistinguishable. 

• If the pattern pictures of two faults (states) are separable in the diag- 
nostic signal space, then these faults are unconditionally distinguishable. 

• If the pattern pictures of two faults (states) have a common subregion 
in the diagnostic signal space, then these faults are conditionally distinguish- 
able. They are indistinguishable in the common region but are distinguishable 
in regions belonging to one pattern picture only. Distinguishability depends 
therefore on the registered values of diagnostic signals. 

In the case of binary diagnostic signals, pattern regions can only cover 
each other or be separate. This the corresponds to the conditions defined for 
the binary diagnostic matrix. For multi- value diagnostic signals, the above 
conditions resolve themselves to the conditions given for the information sys- 
tem. Therefore, the above conditions are a generalisation of the earlier defini- 
tion of distinguishability for continuous diagnostic signals. In such a case, the 
mathematical shape of distinguishability conditions depends on the pattern 
picture description method. 
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3.4.5. Fault distinguishability improvement by taking 
the dynamics of symptoms into account 

The order of occurring symptoms is vital information that is worth using in 
the process of diagnosing. Different symptom appearance orders can char- 
acterise indistinguishable faults (that have identical fault signatures). This 
is illustrated in Fig. 3.17. Taking symptom dynamics into account can im- 
prove fault distinguishability. In many cases, it can also lower the time of 
diagnosing. The problem is described in Chapter 18. 




Fig. 3.17. Sequences of symptoms occurring for differ- 
ent faults having the same signatures V{fk) = [1, 0, 1, 1] 

A monitoring algorithm that takes symptom dynamics into account was 
presented by Koscielny and Zakroczymski (2000; 2001). It uses minimum and 
maximum symptom time values for particular faults. 



3.5. Methods of the structural design of the set 
of detection algorithms 

This chapter deals with the problems of designing the detection algorithm set 
in such a way that it ensures adequate fault distinguishability. Usually it is 
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advisable to obtain the distinguishability of all possible faults of the system. 
However, that is not always possible. 

Fault distinguishability depends on the set of diagnostic algorithms used 
in the process of diagnosing. The set should therefore be properly designed. 
A vital restriction in such a case is the set of measured signals. The higher 
the number of the known process variables, the more detection algorithms 
can be obtained. In many cases, the set of detection algorithms is designed 
for the existing set of measured process variables. 

Different approaches to the problem are described below. In the case 
of fault detection algorithms based on analytical models, methods of the 
structural design of the residual set and banks of observers are developed. 
Another approach consists in a systematic choice of partial models, which 
can also be applied to neural and fuzzy models. The problems of minimising 
the detection algorithm set without reducing fault distinguishability are also 
dealt with. 

3.5.1. Generation of secondary residuals based on physical equations 

On the grounds of the equations of primary residuals, secondary residuals, 
which are sensitive to a different subset of faults than primary residuals, can 
be created. Primary residuals correspond to elementary equations describing 
the phenomena that take place within the system. Secondary residuals are 
created as a result of mathematical operations carried out on physical equa- 
tions which use at least one common process variable. For example, taking 
the value of the flow calculated from the equation (1.9) and introducing the 
equation (1.10), it is possible to obtain the following residual: 

rs = #([/) - ai25i2 v^25(Li - L^) - (3.94) 

The residual is sensitive to the faults of the actuator set and the control line 
but does depend on the fault of the measuring line F. Therefore, this residual 
is sensitive to the following faults: 

^(^ 5 ) == {/ 2 , /s, /s, /e. A, /s, h, / 12 } • (3.95) 

The set is different than the set for residuals ri and r 2 - compare the 
equations (2.58) and (2.59). The same form of residuals can be obtained as 
a result of the operation = ^2 — , where primary residuals are defined 

by the equations (2.5) and (2.6). Summing both sides of the equations (1.10) 
and (1.11), it is possible to obtain a new residual having the following form: 

re = F - a 2 zS 2 zVML 2 - L 3 ) - - ^ 2 ^. (3.96) 

at at 

The ai m of the above operation was to eliminate the part 
(^i 2 Si 2 y/ 2 g{Li — L 2 ) from the equation. The new residual is sensitive 
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to the following faults: 

^(^e) = {/i,/2,/3,/45/i0,/i2,/i3} • (3.97) 

Let us notice that the residual re(...) corresponds to the balance equa- 
tion written for the first two tanks (Fig. 1.3). It can be created as a result 
of the operation re = T 2 + rs, where primary residuals are defined by the 
equations (2.6) and (2.7). 

Secondary residual equations which correspond to the balances for the 
second and the third tank, as well as for all tanks, can be expressed in a similar 
way. By creating the binary diagnostic matrix, it is easy to examine the effect 
of these residuals on the increase of fault distinguishability (Koscielny, 2001). 

3.5.2. Choice of a structural set of residuals generated on the basis 
of parity equations 

Let us assume that the linear models of the system that take effect of faults 
into account are known (Chapter 2). Primary residuals generated using these 
models have the following form: 

r(s) = p(s) - G(s)u(s) = GF{s)f{s). (3.98) 

The equation r(s) = y{s) — G{s)u{s) is defined as a calculation form of the 
residual while the equation r{s) = Gi?(s)/(s) as the inner form. 

Each of the primary residuals in the inner form is the sum of products: 

rj{s) = GjF{s)f{s) = Gji{s)fi{s) 

+ ■ • • + Gjk{s)fk{s) + ■ • • + GjK{s)fK{.s), (3.99) 

where Gjk{s) = yj{s)/ fk{s). The transmittances Gjk are equal to zero if the 
j-th residual is not sensitive to the k-th. fault. 

In order to improve fault distinguishability, it is necessary to design 
additional secondary residuals. They should ensure the possibility of the dis- 
tinguishability of faults that are indistinguishable on the basis of primary 
residuals. Therefore they must be sensitive to one indistinguishable fault and 
insensitive to the other. 

The secondary residuals can be obtained by multiplying the primary 
residuals by the matrix V : 

r{s) = V{s)r{s) = V{s)GF{s)f{s). (3.100) 

Therefore, any of the j-th secondary residuals is generated as a linear com- 
bination of the primary residuals: 



rj(s) = Vj{s)r{s) = Vjmis) + Vj 2 r 2 (s) + ■ ■ ■ + vjqrg{s). (3.101) 
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It is possible to ensure the sensitivity of particular residuals only to the de- 
fined faults and the lack of sensitivity to other faults as well as immeasurable 
inputs (disturbances) by an adequate choice of the vector Vj . This operation 
is the essence of the structurisation of parity equations. 

If the z-th residual is to be insensitive to the fault fk, it is necessary to 
fulfil the following condition: 

Vj(s)Gfj,(s)=0, (3.102) 

where G^ki^) is the column of the matrix Gp that corresponds to the fault 
fk- Gpk = [Gik{s),G 2 k{s), . . .Gqk{s)], while Vj is the j-th row of the 
matrix V : Vj = - The condition (3.102) can be described 

by 

VjlGlk{s) + Vj 2 G 2 k{s) + • • • + VjqGqk{s) = 0. (3.103) 

Let us present a simple example of secondary residual design as an illus- 
tration. Let us assume that the primary residuals in the internal form are as 
follows: 

n = (1 - z~'^)fi - 2/3 - /4, 
r2 = (1 - 2 z~^)f 2 - fs- 4/4. 

The binary diagnostic matrix corresponding to these equations is pre- 
sented in Table 3.7. It can be seen that the signatures of the faults /s and 
/4 are identical. These faults cannot be distinguished one from another with 
the above parity equations. 



Table 3.7. Incidence matrix (binary diagnostic matrix) 
for primary residuals 



R/F 




/2 


fs 


/4 


n 


1 




1 


1 


T2 




1 


1 


1 



It is possible to create additional parity equations. In order to eliminate 
the effect of the fault fs on the designed secondary residual rs , one should 
choose the vector Vj that fulfils the condition (3.103). Let us assume that 
Vj = [1,-2]. Then 



rs 



[ 1 ,- 2 ] 



ri 

rs 



= {1- z ^)/i - 2/3 - fi- 2(1 - 2z ^)/2 + 2/3 + 8/4 
= (1 - z-^)fi -U- 2(1 - 20-1)/2 + 8/4 

= (1 - - 2(1 - 2 z~^)f 2 + 7/4. 
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The obtained residual depends on the faults /i, /2 and but does 
not depend on the fault fs. One can similarly design the following residual, 
which is insensitive to the fault f^. It is possible to obtain it by multiplying 
the residual vector by the vector V'^ = [4,-1] in the form 

n = 4(1 - - (1 - 2z-1)/2 + 7/3. 

The binary diagnostic matrix for the set containing four residuals is pre- 
sented in Table 3.8. The matrix ensures the strongly isolating distinguisha- 
bility of all faults. However, some restrictions appear during the design of 
secondary residuals (Gertler, 1998; Koscielny, 2001): 

a) if any one of the subsets of faults has an effect exclusively on one 
primary residual, then these faults are indistinquishable (any one of the sec- 
ondary residuals can be sensitive to all of the faults or to none of them); 

b) if two inputs in the equations of residuals are linearly dependent, then 
faults which correspond to them are indistinquishable; 

c) if the number of faults is higher than number of the outputs of the 
system (the condition is always fulfilled in reality), then it is not possible 
to obtain any structure of the binary diagnostic matrix. For example, the 
diagonal structure is impossible to obtain. 



Table 3.8. Incidence matrix for the set of four residuals 



R/F 


h 


h 


h 


h 


ri 


1 




1 


1 


V2 




1 


1 


1 


rs 


1 


1 




1 


r4 


1 


1 


1 





Therefore the required fault distinguishability cannot be always obtained 
without introducing additional measurement lines. 

3.5.3. Banks of observers 

Banks of observers (Frank, 1987; 1990) are also applied in order to ensure 
adequate fault distinguishability. The known structures of banks of observers 
are applied to the diagnostics of measurement systems (Instrument Fault 
Detection, IFD), actuators (Actuator Fault Detection, AFD), and installation 
components (Component Fault Detection, CFD). An example of a bank of 
observers used for the diagnostics of measurement lines is presented below. 

The Dedicated Observer Scheme (DOS), introduced by Clark (1978) 
for the diagnostics of measurement system faults, is shown in Fig. 3.18. It 
contains a set of observers. Each one of the observers re-creates the values of 
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Fig. 3.18. Fault diagnostics with the use of the state observer bank DOS 

all outputs on the basis of a complete set of inputs as well as one input that 
differs in particular observers. The number of observers equals or is lower than 
the number of outputs. If the system is observable, the g-tuple redundancy 
of the vectors of estimated output signals is achieved. 

A fault of a measurement line connected with an observer causes the 
inconsistency of all measured and estimated signals at the observer output. 
On the other hand, a fault of any other measurement line leads to inconsis- 
tency between the measured and the calculated value of this signal only. This 
dependence is the basis for fault isolation. Due to the g-tuple redundancy of 
output signals {q observers), it is possible to distinguish multiple faults. A 
simple logical system is applied to inference about faults. Among the disad- 
vantages of a bank of observers designed for a specific group of faults, one 
can distinguish neglecting the effect of other kinds of faults. 

3.5.4. Design of a structured set of detection algorithms 
based on partial models 

The above approaches are applied in the case of linear models of the system. 
They are not suitable for the design of a structured set of residuals in the 
case of fault detection on the basis of nonlinear analytical models, as well as 
neural and fuzzy models, especially when all possible kinds of faults are to 
be taken into account. 

The following procedure (Koscielny, 2001) is applied to the design of a 
structured set of residuals: 

a) Partial models (analytic, neural or fuzzy ones) are designed for the 
smallest possible parts of the system with the use of available measurement 
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signals. If a model is to be created, a set of measurements has to exist that 
should allow modelling the static and dynamic properties of each one of the 
subsystems with defined accuracy. The set of partial models should cover the 
whole system. Each one of the partial models is the basis for a detection 
algorithm. 

b) The set of the detected faults is defined for each detection algorithm. 
An expert’s knowledge about the effect of faults on residuals is used. In the 
case of physical models, one can also apply equations of residuals that take 
the effect of faults into account. 

c) If partial models of certain parts of the system cannot be obtained, 
one should consider the possibility of introducing additional measurement 
signals. If this is not possible for technical or economic reasons, one should 
design tests that apply heuristic knowledge about the system. 

d) The achieved fault distinguishability is analysed. If it is not adequate, 
additional tests are designed on the basis of joining partial models for neigh- 
bouring parts of the system. 

e) Such a procedure is carried out further if necessary, and successive 
models applied in the algorithms of tests encompass larger and larger parts 
of the system. 

The presented procedure of the design of the detection algorithm set has 
an iterative character. The binary diagnostic matrix is also created during 
the design. The matrix is defined by the subsets of faults F{sj) detected 
by particular detection algorithms. The information system FIS can also be 
designed instead of the binary diagnostic matrix. After the primary set of 
detection algorithms has been designed, one carries out the analysis of the 
obtained fault detectability and distinguishability, and eliminates unneces- 
sary residuals or adds new ones that increase the degree of fault detectability 
or distinguishability. Often the set of the analysed faults is modified. The 
process is repeated many times until the required properties are achieved. 

Such design methods were presented in (Koscielny, 2001) with the help 
of an example of the diagnostics of a three-tank set with the use of detection 
algorithms based on physical models. 



3.5.5. Minimising the set of detection algorithms 

In many cases, the designed set of detection algorithms can be minimised 
without reducing fault distinguishability. Algorithms of searching for a reduct 
in the information system can be applied (Pawlak, 1983; 1991). They are 
suitable also for minimising the binary diagnostic matrix, which is a special 
case of the information system. 

Each one of the equivalence relations divides the set of systems X of 
the information system into two separable classes - elementary blocks of the 
information system. Two information systems are equivalent if and only if 
their elementary blocks are identical. 
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The set of attributes B C A is called the reduct of the set of attributes 
A if and only if they generate the division of faults into the same elementary 
blocks, and there exists no C C B C A that would ensure the same division. 
The system 

IS^ = (X,B,V^,r^) (3.104) 

is called the reduced system, where = Utge : X x B V^. 

The information system can possess more than one reduct. 

In the case of the FIS, the relation of unconditional indistinguishability 
divides the set of faults F into separate elementary blocks. The set of diag- 
nostic signals Sp C S is called the P-type reduct of the set of signals S 
if and only if they generate the division of faults into the same elementary 
blocks, and there exists no 5* C 5^ C 5 that would ensure the same division 
(Koscielny, 2001). The system 

FIS^ = (P, 5^, Vf, r^) (3.105) 

is called the reduced system, and Vg = domain (the set of 

values) of attributes as well as : F x Vf . 

Elementary blocks determine the highest distinguishability. The subsets 
of conditionally indistinguishable faults in the complete FIS as well as in the 
reduced FIS (P-type) do not have to be identical. 

The Q-type reduct of the set of diagnostic signals 5 is called the smallest 
possible set of diagnostic signals 5^ C 5, for which the subsets of uncondi- 
tionally undistinguishable faults (elementary blocks) as well as the subsets of 
conditionally undistinguishable faults are the same as for the complete FIS. 
The following system corresponds with the Q-type reduct: 

= (^F,S^,V^,r^^ . (3.106) 

In this system, Vg and are defined similarly as V£ and . The Q- 
type reduct can contain more diagnostic signals than the P-type reduct: 
CS^ C S. 



3.6. Fault identification 

Fault identification takes place after the phase of isolation. Fault identifi- 
cation consists in defining the fault size and possibly the character of its 
changeability in time. Carrying out fault identification is possible in the case 
of inference based on system models. 

Residual equations that take the effect of faults into account in an analyt- 
ical way supply the highest number of data for such inference. The estimation 
of the size of the isolated faults can also be carried out in the case when the 
residuals are generated on the basis of analytical models with the knowledge 
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of its calculation shape only, or their fuzzy and neural models. In this case, 
the analytical relationship existing the between residuals and the faults is 
not known. In every case, however, it is necessary to know the diagnostic 
relation, i.e., to know which residuals are sensitive to faults whose occurrence 
has been ascertained at the stage of the isolation process. 

Let us assume that in the simplest case, only single faults exist. It is 
necessary to define the set of diagnostic signals sensitive to a fault for the 
recognised fault: 

S{fk) = {sj ^ S : fk RfsSj} . (3.107) 

If diagnostic signals occur as a result of residual value evaluation, the residual 
sensitive to a given fault corresponds to each one of the signals belonging to 
S{fk)- Therefore, R{fk) is the set of residuals that detect the k-th fault: 

R{fk) = {rj eR:vj= q{fk)} . (3.108) 

One can carry out fault identification on the basis of residuals belonging 
to this set. Two cases of inference will be analysed: when the analytical de- 
pendence of the residuals on the faults is known, and when such dependence 
is not known. 

3.6.1. Residual equations 

Let us assume that the linear or nonlinear equations of residuals that take the 
effect of faults into account are known. Each one of the residuals is therefore 
defined by the formula: 

rj =q{y,u,f,t). (3.109) 

Let us assume that the fault fk has been ascertained during isolation, i.e., 

fkT^O, fi = 0, z-l,2,...i^, i^k. (3.110) 

The following dependence is therefore fulfilled for all residuals belonging to 
the set R{fk)- 

rj ^q{y,u,fk,t) ^0. (3.111) 

If the inverse equation can be determined, one obtains the dependence 
of the residual fk as a function of the residual: 

fk = q*{y,u,rj,t). (3.112) 

Since the input and output signal values are known, the above equation 
can be written in the following form: 



fk = q*(rj,t), y = yo, x = xq. (3.113) 

The formula describes the dependence of the fault on the residual, the course 
of which has been registered. It is therefore possible to determine directly 




110 



J.M. Koscielny 



the size of the fault as a function of time on the basis of this formula. Even 
if a reverse model having the shape (3.113) cannot be determined, one can 
determine the fault size in a numerical way using the following formula: 

Tj = q{y,u,fk,t), y = Vo, x = xo. (3.114) 

For the identification of a single fault it is sufficient to know only one 
dependency of the shape (3.113) or (3.114). If the set R{fk) contains more 
residuals, then the fault value can be calculated as the mean value for all 
residuals that are sensitive to this fault: 

1 ^ 

= y = yo, x ^ xq, N = IR(fk)j . (3.115) 

Let us illustrate the above deliberations with examples of fault identification 
in the three-tank set (Fig. 1.3). Let us consider equations of residuals gener- 
ated on the grounds of models that take the effect of faults into account - 
(1.24) to (1.27). The equations have the following form: 

ri = F + fi-k, [5(J7 + /s) + fe] + A + 



r2 



. d(Li-f/2) 
TI 



-(i^ + /l) 



H <^12(*S'l2 + f9)y/2g[(Li + /2) — (1/2 + /s)] H- fl2, 



(3.117) 



^3 — ^2 ^ — Oti 2 {Si 2 + f 9 )^/ 2 g[{Ll -h /2) — (1/2 + /s)] 

+ 0^23{S23 + flo)^/2g[{L2 + /s) - (L /3 -h /4)] + fl3, (3.118) 

^4 = As ^ - 0(23 (*5^23 + fio)\/ 2 g[{L 2 4- fs) - ( 1/3 H- / 4 )] 

H" 0 ( 3(53 -h fii)y/2g{Ls + /4) 4 - /i4- (3.119) 

Let us assume that the existence of the fault /14, i.e., a leak from the 
third tank, has been ascertained during fault isolation. Therefore /14 ^ 0 
and fi = 0 for i = 1, 2, . . . , 13. Only the residual r 4 is sensitive to this fault: 

^4 ~ ^23*523^2^(^2 - Ls) + asSs\/2gLs + / 14 , (3.120) 

which means the following: 

/i4 — f'A— ^ 3 -^ — (^23823 V‘^9{L2 — Ls) + asSsy/2gLs. (3.121) 
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It is possible to ascertain based on the equation (1.12) that the last three 
components of the above equation are zeroes (this is the balance of flows in 
the third tank that was determined without taking the effect of faults into 
account). Therefore, fu = ^4- The residual value denotes the leak from the 
third tank in the units of the flow. 

Let us assume now that the fault /lo has been ascertained, i.e., the 
partial clogging of the channel existing between the second and the third tank 
has occurred. Therefore, /lo / 0, and fi = 0 for i = 1, 2, . . . , 9, 11, . . . , 14. 
The residuals rs and r4 are sensitive to this fault: 

^3 = ^2-tt “ ^i2Si2y/2g{Li — L 2 ) + Oi2s{S23 d- fio)V‘^9{L>2 — Ls), (3.122) 



r4 




— o^23{S23 + /io)\/2p(I/2 — I/3) + asSs^/2gLs. 



(3.123) 



It is possible to determine formulas that define the fault fio using the above 
equations: 




'f’3 — [^ 2 ^ — o^l2Sl2^y2g{Ll — L2) + (^23S23y/‘^g{L2 — T 3 )] 
o^23y/‘^g[{L2 — Ls) 



^3 

0^23 V2p[(L2 — I/3) 



(3.124) 



~ [^3^- ~ Q^23*S'23 v^2^(L2 — L 3 ) 83 ^/ 2 gL^] 

— <^23\/2^(I/2 — I/3) 

u 

-<^23 V^2^[(Z/2 — Ls) 



(3.125) 



The expressions in square brackets correspond to the system model in 
the state of complete efficiency and are equal the zero. If the residual value 
as well as the values of the levels in the tanks Z2 and Z3, and the value of 
the flow a 23 are known, it is possible to calculate the size of the flow /iq. 
By repeating this calculation in successive signal sampling moments of time, 
one can obtain the course of the changes of the fault value in time. 

Calculations obtained on the basis of the equations (3.124) and (3.125) 
can differ one from another due to modelling errors and measurement noise. 
The mean value of the fault can be obtained when one uses the equa- 
tion (3.115). 



3.6.2. Residuals without the knowledge of the effect of faults 

Equations of residuals that take the effect of faults into account are not usu- 
ally known. One should use a calculation form that conditions the residual to 
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the measured variables. On the other hand, the diagnostic relation is known, 
and the set S{fk) of diagnostic signals that detect that fault (3.107) as well 
as the set R{fk) of residuals sensitive to this fault (3.108) can be given. In 
such a case, fault identification consists in estimating the fault size on the 
basis of residual values. 

An elementary index of the symptom size that testifies to the fault size 
is the ratio of the value of the residual rj to its threshold value Kj . As the 
measure of the fault size, one can assume, for instance, the mean value of 
such elementary indices for all residuals that correspond to signals belonging 
to the set S{fk)’ 



= E ^.,N = \R{h)\. (3.126) 

j-rjeRifk) ^ 

The residual values at the moment of diagnostic inference or mean values 
in the window having a defined length can be taken in the above equation. 
Fuzzy logic can also be applied to fault size estimation in this method. 



3.7. Monitoring the system state 

Diagnoses should be formulated in real time for industrial processes. The con- 
tinuous or discrete diagnosing of the system state is called on-line diagnostics 
(Koscielny, 1995). Further on, however, the term monitoring the system state 
will be used. The monitoring of the state is a continuous (periodically re- 
peated) formulation of diagnoses about the system state or about the changes 
of the system state in an automatic way by a diagnosing computer system. 
It means the diagnosing is carried out on-line in real time. 

A vital difficulty encountered during industrial process state monitoring 
is the lack of the possibility of applying special test inputs (only working 
signals are used), as well as the necessity of taking the diagnosed system 
dynamics into account. The diagnosis should be formulated taking the times 
of the propagation of fault symptoms into consideration. 

During the monitoring of the system state, frequent changes of the avail- 
able set of process variables (measured signals) X, i.e., the sets of available 
diagnostic signals 5, take place. Diagnostic signals controlling previously 
recognised faults become also temporarily useless. They cannot be applied 
until the state of efficiency is restored. The set of diagnostic signals (tests) 
available at the moment of monitoring is the subset of the set of signals gen- 
erated by all detection algorithms. Changes of the functioning structure of 
the system (e.g., temporary switching off of some technical devices) cause 
also changes of the set of faults F, which should be recognised. 

It is therefore impossible to define only once indistinguishable elementary 
blocks or unchangeable diagnostic inference rules. They frequently change 
during the operation of the system. It is necessary to run the entire operation 
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of diagnostic inference in real time taking into account the changes of the sets 
of process variables X, diagnostic signals 5, as well as the set of faults F, 

At the n-th moment of time, the system state is defined by the set 
F{l)n of the existing faults. The set F(0)n of faults that never appeared 
is its complement to the set of all faults F. In an ideal case, the diagnosis 
at the n-th moment of time should indicate the subset of faults that have 
appeared since the moment of the generation of the previous diagnosis: 

AF{l)n = F{l)n-F{l)n-i = {fk : [z(fk)n = -i] A [z{fk)n-l=0]}, (3.127) 

as well as the subset of faults that have been eliminated during this period 
of time: 

AF(0)„ = F(0)„_i -F(0)„ = {/fc : [z(/fc)„ = 0] A [z(fk)n-i = l]} ■ (3.128) 

In a general case, the momentary diagnosis contains therefore two elements: 
DGNn = { AF(1)„, AF(0)„}. (3.129) 

The diagnosis formulated at the n-th moment of time concerns, how- 
ever, the previous moment of time, which is associated with time delays in 
the appearances of fault symptoms in dynamic systems, and with the time 
necessary for diagnostic synthesis. One should notice that successive faults 
could have appeared at the moment of the n-th diagnosis generation, and 
that the symptoms of these faults have not been detected yet by the realised 
diagnostic tests. The system state that is re-created in the supervising pro- 
cess is not therefore completely consistent with the real state. This is caused 
not only by the diagnosed system’s dynamics but also by the limited distin- 
guishability of states, the uncertainty of test results, etc. 

The monitoring of the system state can therefore be examined as two 
separate processes of diagnostic inference. The first one consists in isolat- 
ing the appearing faults, and the second one in recognising the events of a 
comeback to the normal state (state of efficiency). The comeback to the state 
of efficiency requires interference by a human, e.g., a repair or replacement 
of the damaged elements. The system operating personnel is notified about 
such changes. Thus the problem of recognising the comeback to the normal 
state is of much less practical importance than the isolation of the appearing 
faults. 



3.8. Summary 

This chapter presents a synthetic look at the problems of process (system) 
diagnostics. Fault detection methods based on the knowledge of the system, 
as well as those that do not use a model, have been discussed. Fault isolation 
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and the recognition of system state methods have been presented. The prob- 
lems of fault (state) distinguishability and methods of choosing the detection 
algorithm that leads to higher distinguishability have been considered. The 
specifics of monitoring the system state have also been characterised. The 
description of particular problems is brief. Not all of the methods and prob- 
lems could have been taken into account in this synthetic frame. However, 
the study of this chapter should help to understand issues that are discussed 
in the following chapters. 
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METHODS OF SIGNAL ANALYSIS 

Wojciech CHOLEWA*, Jozef KORBICZ**, 
Wojciech MOCZULSKl*, Anna TIMOFIEJCZUK* 



4.1. Introduction 

Information obtained as a result of observing objects that are subjects of a 
diagnostic process is a basis for inference in technical diagnostics. Depending 
on the kind of diagnostic observations considered, information can deal with 
physical quantities that are connected with the object operation (e.g., the flow 
intensity of a medium in a suction connector) . Information can be also related 
to residual processes that are effects of each object operation (e.g., the level 
of acoustic emission during the operation of a cutting tool) . To generalise the 
numerous kinds of observations, one can consider a signal (diagnostic signal) 
as a material carrier that makes it possible to transmit information about the 
observed object or process. In most cases this carrier is a set of any quantities. 
The application of information included in the signal requires its appropriate 
description. Signals may be effectively described by sets of values of features 
that are results of signal analysis. Examples of features of a stochastic signal 
can be the estimates of that process. 

In order to correctly interpret the observations that are to be performed, 
it is recommended to clearly distinguish between the classes of the features 
considered (e.g., vehicle speed) and their values (e.g., 75MPH). The values of 
features can be of different types. They can be represented either by quantity 
values (exact or approximate) or nominal ones. In the general case a feature 
value can be a single value (e.g., maximum value of speed) or a function value 
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(e.g., the plot of the density of speed distribution). It should be stressed that 
the correct choice of the type of feature values influences the possibility of 
their comparison. 

A signal and its features can be considered (observed, recorded and analy- 
sed) within several domains. Time and frequency domains are most often ap- 
plied. The modal domain is used for observing such objects for which a spatial 
presentation of the observed physical values is useful. It is often stated that 
signal descriptions in these three domains are equivalent to each other. The 
only reason for simultaneous consideration of these domains is the possibility 
of an easier interpretation of the meaning of numerous feature values in the 
given domain. An example of that can be the comparison of the diagram of 
changes of signal values in the micro time with the signal spectrum. However, 
it must be stressed that the equivalence of these domains exists only when 
the influence of noise may be neglected (Cholewa and Moczulski, 1993). 

Signals considered as time series of physical quantities require a clear 
distinction between two classes of time, called (Cholewa, 1983) respectively 
the micro and macro time, which are also known as the dynamic time and the 
lifetime (Cempel, 1982), respectively. It is assumed that micro time moments 
correspond to the values of signal features, whereas changes of these features 
are observed in the macro time. The changes may be an effect of the evolution 
of the observed object or process. The time domain can be any linear ordered 
set, and its elements are called time moments. Examples of such elements 
can be the number of rotations of the machine shaft or the commonly used 
quantity of the pumped medium. A particular example of this concept is 
ordinary time. 

Signal analysis is understood as an operation that gives as a result a set 
of signal features. The basis of each analysis method is the assumption about 
an appropriate signal model. 

• Universal signal models can be determined in the form of models that 
are independent of the examined object and the observed signal. An example 
of this class of models is the representation of a signal as the sum of harmonic 
components, which is characteristic for methods based on the Fourier trans- 
form. The methods connected with the identification of universal models are 
generally named non-parametric ones. 

• An interesting group of signal models is based on the assumption that 
the model can be acquired without the need of taking into consideration 
information about the examined object. Examples of such models are Au- 
toRegressive models ( AR) , Moving- Average models (M A) or AutoRegressive 
Moving Average models (ARM A). The application of such a model may be 
interpreted as searching for a linear Alter that makes it possible to consid- 
er the analysed signal as a result of Altering white noise. These methods 
are generally named parametric ones. The difference between these and non- 
parametric methods consists in the fact that signal models are represented 
as functions of a small number of estimated feature values (parameters). 
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• An intensively developing group of models is based on the assumption 
that the model should take into account specific results caused by the op- 
eration of the object considered, which is generating signals being observed. 
Since changes of operating conditions usually infiuence the way in which a 
signal is generated, considering such models is particularly justified when the 
conditions of the operation of the examined object vary in time. Examples 
of varying conditions are the run up or run down of rotating machinery. The 
models may be identified either with the use of non-parametric methods (e.g., 
short-time spectra) or parametric ones (e.g., ARM AX models). 

• The next group of models includes models that take into consideration 
the operation of the examined object. The basic assumption in this case is 
to take advantage of information on the sequences of the object’s operation. 
The goal of that assumption is to average synchronously the observed sig- 
nals, which makes it possible to eliminate accidental components that are 
not connected with a cyclic process. An example of the application of such 
models is averaging signals that describe the motion of the journal in the 
hydrodynamic bearing bushing, or the estimation of the average trajectory 
of the centre of the journal. 



4.2. Signal classification 

There always exists a model that can be assigned to an individual signal. The 
expected identification of the model determines the way in which the signal 
is analysed. Taking into account the kind of that model, the following groups 
of signals can be distinguished: 

• deterministic signals, which can be described exactly with the use of 
mathematical models that do not include any random values, 

• random signals, which are described by models of stochastic processes 
that represent the set of signals {xa{t) | o; G T}, where T is the set of signal 
indices. 

The classification of random signals is usually connected with checking the 
stationarity of signals, and detecting the type of non-stationarity when the 
signal is non-stationary. It is important that the detection of non-stationarity 
necessitates the application of special and sophisticated methods of signal 
analysis. The non-stationary signal is a signal whose statistic features are 
time dependent. The stationary signal in the broad sense (called also a weak 
stationary signal) is defined as a signal whose expected value is constant and 
equal to the mean value, and the autocorrelation function Rxx(j) depends 
only on the time delay r. These definitions are expressed as follows: 

E{x{t)] = idem (t) = (4.1) 

Rxx{tl,h+T) = Ra:x{t2,t2 + T) = Rxx{t), (4.2) 
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where £J{*} is an operator of the expectation value. The mean value 

of a random signal Xa{t) is defined as the expectation value of a random 

variable {xa{t) | a G F}: 

Hxit) = E{xa{t) I a e r} = E{xa{t)}. (4.3) 

The autocorrelation of a random signal is calculated on the basis of the 
formula 

Rxx{t,t-\-r) = E{xa{t)xa{t + r) I a G r}. (4.4) 

Stationarity in the narrow sense requires satisfying the above conditions by 
higher order statistic moments. 

Moreover, within the group of processes represented by non-stationary 
signals one can also distinguish processes characterised as ergodic and non- 
ergodic ones. An ergodic process is a process for which it is possible, with 
probability equal to 1, to estimate its features with the use of a single, long 
enough signal representing the process. The features defined in such a way 
are considered to be representative of the process. Taking into account the 
definition of the ergodic process and the operator of the running average, one 
can state that the criterion of ergodicity is satisfied when the features of the 
signal estimated with the use of that operator, i.e., 

1 

^ {v{xa{t)))^j,= — ^ v{Xa{u))du, (4.5) 

are equal to the features estimated with the use of the operator of the expec- 
tation value that is calculated within the set of all realisations of the analysed 
signal. According to that the stationary signal is ergodic when the following 
conditions are satisfied: 



Vaer, /i. = (4.6) 

Vaer, r>0, = RiZ\r). (4.7) 

Among deterministic signals two special kinds of signals can be considered: 
periodic signals and non-periodic ones. 

For periodic signals it is possible to define the period 0 < T that en- 
ables us to describe the signal with the use of a time function x{t) with the 
following property: 

x{t -\-T) = x{t). (4.8) 

Within this group one can distinguish polyharmonic signals, which can 
be defined by the linear combination of k harmonic components: 

k 

x{t) =^^Xi cos {27rfit -h Oi), 

i=l 



(4.9) 
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where Xi, fi = i • fo (/o is a basic frequency) and 9i are respectively magni- 
tude, frequency and initial phase for the individual i-th signal components. 
The polyharmonic signal is also characterised by the frequency /o and fre- 
quencies of consecutive components, which are multiples of /o- 

Non-periodic signals can be divided into: 

• transient signals, characterised by a non-zero limited support, which 
is relatively short in comparison to signal duration. Outside this range sig- 
nal values are equal to zero. An example of such a signal is the course of 
acceleration values recorded on the body of a car during a collision. 

• signals characterised by an infinite duration; here it is assumed that 
the time of observation is significantly shorter than the time of signal 
duration. 



4.3. Initial pre-processing of signals 

Observing the operation of an examined object and its interaction with the 
environment is usually performed with the use of different transducers that 
convert signals to be analysed. The initial pre-processing of signals should 
ensure a correct signal-to-noise ratio and at the same time the elimination 
of unwanted properties of static and dynamic characteristics of sensors as 
well as the correctness of the complete measurement line (measurement de- 
vices). In order to satisfy these requirements the use of a widest possible 
range of the dynamics of the measuring channel as well as the application 
of correction procedures and band filters is needed. It enables us to remove 
trends and signal components which belong to uninteresting frequency ranges 
(Lyons, 1997; Otnes and Enochson, 1972). The scope of initial signal pre- 
processing should be assumed at an early stage of the design of the measure- 
ment channel. 

It is also worth stressing that the need for the dynamic elimination of 
noise often appears. However, predicting the noise character and distribution 
is impossible at the design stage. As characteristic examples of that need one 
can give measurement set-ups used for observing the motion of the journal 
in the bushing of a slide bearing. In this situation the result of the observa- 
tion, which is usually presented in an orbit form, has an important diagnostic 
meaning. The observation of the journal is usually carried out with the use of 
non-contact relative displacement eddy current probes. The lack of sufficient 
homogeneity of the outer layer of the journal, which occurs quite often, is the 
main reason behind the presence of systematic (periodic) noise that signifi- 
cantly deform the observed orbit. Hence, an important stage of initial signal 
processing should be the subtraction of a correcting signal that is estimated 
in an adaptive way. 
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4.3.1. Analogue-to-digital conversion of signals 

Contemporary measurement as well as signal analysis procedures and devices 
are mostly formed on the assumption that a digital signal is directly provided 
by a signal source or estimated with the use of an Analogue-to-Digital (A/D) 
converter. The A/D processing requires taking into consideration the side 
effects of these operations, which can significantly influence the obtained 
results or make their correct interpretation difficult. 

The main point of analogue-to-digital processing are two operations: sam- 
pling in the time domain and the quantization of a signal amplitude. The 
sampling is connected with the estimation of time moments for reading out 
discrete signal values. An interval between two consecutive time moments 
(a sampling interval) is in most cases fixed and should be chosen in a way 
that makes it possible to reproduce a continuous signal on the basis of the 
sampled signal values. 

An incorrect choice of the sampling interval makes it often impossible to 
reproduce the continuous signal and causes the appearance of the aliasing 
effect, which is shown in Fig. 4.1. The criteria of the choice of the sampling 
interval are determined by the Kotielnikov- Shannon theorem (also called the 
sampling theorem). According to it, a signal x{t) that does not contain com- 
ponents for frequencies greater than /max can be unambiguously determined 
by its temporal values if the distance between two consecutive values (in 
the time domain) is no longer than 1/ (2/max)- This distance is exactly de- 
termined for such signal components whose magnitude is equal to zero in 
these time moments in which discrete signal values are determined. Accord- 
ing to that, one can conclude that a sampling frequency fs should satisfy 
the following condition: fs > 2/max- Other variants of the sampling theorem 
are also known (Szabatin, 2000). In practice the condition fs > 2.56/max is 
recommended. 

If this theorem is not satisfied the aliasing effect may occur. The phe- 
nomenon consists in mirroring these harmonic components of the signal which 
are characterised by frequencies greater than a Nyquist frequency, defined as 

/a = /s/2. 

Amplitude quantization consists in mapping signal values that belong to 
a continuous amplitude set to values that belong to a finite g-element set. 
The elements of the second set are called levels of quantization (Fig. 4.2). 
The result of that operation is a digital signal. Quantization is often con- 
nected with the appearance of a characteristic disturbance of a signal called 
quantization noise. A variation of the noise is strictly related to the width of 
quantization intervals. 



4.3.2. Filtering 

Signals recorded during the operation of industrial objects very often contain 
components such as trends, harmonic components or rapidly varying compo- 
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(a) 





Fig. 4.1. Influence of a sampling interval on a discrete signal: 
(a) correct interval, (b) incorrect interval (too large) 




Fig. 4.2. Amplitude quantization and quantization noise 
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nents similar to noise. The operation of the object and its observation are 
usually accompanied by different side effects. As an example one can give the 
measurement with the use of long cabling near electric or electronic devices 
that generate unexpected components, which are characterised by the line 
frequency. 

It is strongly advised to remove the influence of these side factors be- 
fore performing a diagnosis. Relatively simple tools for the elimination of 
some discrete components belonging to given frequency bands are digital 
filters (Fig. 4.3). 

u(k) 
input 

Fig. 4.3. Representation of a digital filter 

The following convolution is defined as a connection between input and out- 
put signals of a filter (Rutkowski, 1994): 

oo 

y{f^)= u{l)h{k-l), (4.10) 

l= — oo 

where h{k) is an impulse response of the filter. According to that the rela- 
tionship between the signals x{k) (x{k) — and y{k) in the frequency 
domain takes the form 

CX) CX) 

y{k) = ^ ^ = x{k)H{e^‘^), (4.11) 

/= — OO /= — OO 

where is a frequency response of the filter. One should point out the 
filtering method as well as the influence of the filter type on the definition H. 

Apart from the well-known non-optimal solutions it is also possible to 
present a group of filters of the FIR type (Finite Impuls Response), generally 
defined by a difference equation: 

M 

y{k) = Y^aiu{k - i). (4.12) 

i=l 

The second group of filters, which are of the HR type (Infinite Impulse 
Response), is given by the formula 

N M 

y(^) = - 0 + XI 

i=l j=l 




(4.13) 
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HR filters can be also expressed by means of an impulse response H{q)\ 



H{q) = 



yjq) 

U{q) 



N 

E hq~'° 



k =0 



M 



E <^kq~’^ 

k=Q 



(4.14) 



Similarly as in the case of analogue filters one can enumerate the following 
types of digital filters: low-pass, high-pass, pass-band and hand- stop ones. The 
initial stage of designing such filters includes defining filter properties. These 
requirements are usually given in the frequency domain. It is obvious that 
noise and disturbances are in most cases different from the analysed signal 
by nature. 

The design of a real low-pass filter enables us to define (Fig. 4.4) a fre- 
quency pass band (0,cjp), within which amplitude characteristics should ap- 
proximate the values of the filtered signal with an error no greater than ±5i. 
That entails 

1 — ^1 < ^ 1 + 1^1 ^ ( 4 - 15 ) 



Similarly, in the stop band, the amplitude characteristics should approx- 
imate the signal values with an error no greater than 62 ^ 

<<^ 2 , W, <|w|<7T. (4.16) 

Ideal filter approximation is always connected with the introduction of a tran- 
sition band characterised by the non-zero width (oj — ujp), within which the 
amplitude characteristics change smoothly between pass and stop bands. 




Fig. 4.4. Tolerance ranges for a real low-pass filter 
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The nonlinearity of the characteristics within that transition band is typi- 
cal for the described non-optimal digital filters. Therefore, if one has to decide 
whether the filtering of measuring data is reasonable, there has to be more 
advantages coming from filtration than disadvantages caused by a distortion 
of information which is going to be used in the diagnostic system. Because 
of that the problem of a correct choice of a filter as well as its adjustment to 
diagnostic requirements is incredibly important. 

Examples of digital low-pass filters are Butterworth filters (Szabatin, 
2000), flat in pass band and monotonic in the entire active range (Fig. 4.5). 
The amplitude characteristics of the analogue Butterworth filter are defined 
as follows: 

= — tAw- ('‘■17) 






The growth of the order N makes the filter characteristics more steep, and 
as a result they become closer to the ideal (Fig. 4.5). 

A more effective approach to non-optimal filtering is the application of 
Tschebyshev filters, which make it possible to obtain a better approximation 
of ideal characteristics with the application of a lower filter order at the same 
time. The amplitude characteristics are in this case defined by the following 
formula: 






1 






(4.18) 



where Vn{x) is a Tschebyshev polynomial of the A^-th order defined by the 
equation 

Vn (x) = cos {N arc cos x ) . (4.19) 




Fig. 4.5. Dependence of amplitude characteristics of 
Butteworth filters on the filter order N 
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The Tschebyshev filter is unambiguously determined by three parameters: e, 
= o;c/27r and N. In the case of typical tasks the parameter e is defined 
by means of an error of linearity within the pass band that is assumed to be 
an acceptable one, fc is a limit frequency, whereas the order N is usually 
chosen in a way that makes it possible to satisfy the requirements in the stop 
band. 

Optimal filter design is a more complex process. Examples include Wiener 
and Kalman filters (Anderson and Moore, 1979). The determination of their 
parameters requires additional information about the analysed process. Diag- 
nostic devices hardly ever include optimal filters, particularly Kalman ones. 
They are rarely used at the stage of the initial processing of signals, but 
their important application is to estimate the variables of a state (Pieczyh- 
ski, 1999). 

4.3.3. Smoothing 

The process of signal smoothing consists in removing high frequency compo- 
nents from the signal. Smoothing algorithms in contrast to filtering (Anderson 
and Moore, 1979) allow using past results of measurements as well as future 
ones with respect to the moment at which the signal is being smoothed. Dis- 
crete signals that are characterised by the Gaussian distribution are usually 
smoothed with the use of the least squares method. When the Gaussian dis- 
tribution of both the input signals and measurement noise is not assumed, 
linear smoothing is applied. 

Generally, there are three types of smoothing: 

• fixed-point smoothing - in this case the estimates of a signal x{j) are 
calculated for a given fixed time j, which is defined as x{j | j + A') for a 
constant j and each N ; 

• smoothing with fixed delay, which is related to such on-line smoothing 
that consists in constant delay N between the moment of signal receiving and 
the moment when the estimate is calculated, which is defined as x{k — N \ k) 
for each k and constant A; 

• smoothing with fixed range, which deals with the smoothing of a limited 
set of data, defined as x(fc | M) for constant M and each k within the range 
0 < A: < M. 

One of the simplest fixed-point methods is a method that uses approxi- 
mation polynomials, which is a generalisation of smoothing by means of the 
moving average method. The application of these polynomials is approxi- 
mation (Fig. 4.6) with the use of M-th degree polynomials for results of 
measurements that belong to a given range whose width is (2 A -h 1) At. 
Considering the smoothed point, the measurements are placed symmetrical- 
ly. The polynomial value in the middle of the studied range is one individual 
point P(0) of the smoothed signal. Moving the range one point forward and 
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Fig. 4.6. Approximation polynomial identification 
and the determination of a smoothing point P(0) 



repeating the calculations lets us estimate the next point of the analysed 
signal. 

In order to formally consider the smoothing problem one assumes the M- 
th degree polynomial P{k), which for discrete time moments is defined as 
follows: 

M 

P{k) = ^ (4.20) 

m=0 

An approximation problem for two successive (2AT+1) points of the sequence 
X-K, • • . , Xq,. . . xk, may be solved by using the minimisation of the following 
quality index: 



K 



K 



J- E {xk-Pik)r- ^ 



k=-K 



k=-K 



M 

Xk ^ ^ ^71 
m=0 



(4.21) 



It is actually necessary to estimate the unknown values of polynomial coef- 
ficients bo,bi, . . . ,bM with the minimisation of the criterion (4.21). Hence, 
in order to estimate the polynomial value in the smoothed point P{0) only 
the knowledge of the coefficient bo is required, which is expressed by the 
equation (Mahczak and Nahorski, 1983): 



K 

P(0) = bo = ^ CkXk- (4.22) 

k=-K 

The coefficients Ck of polynomial M may be estimated independently of 
the smoothed sequence. As an example one can give the polynomial P{k) 
of degree M = 1. In that case the coefficients may be estimated by using 
the equation Ck = 1/(2A" + 1), for k = —K, . . . , 0, . . . , JT. Thus, smoothing 
methods consist in smoothing with the application of the moving {2K -h 1)- 
point averaging. 
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The application of a simple smoothing algorithm in comparison with more 
advanced methods (Anderson and Moore, 1979) may cause some difficul- 
ty with the choice of suitable K and M values. Moreover, the analysis of 
amplitude characteristics of the algorithms of smoothing with the use of ap- 
proximating polynomials proves that the initial decrease in the characteristics 
is followed by the appearance of ripples. Hence, some signal components may 
be attenuated more effectively than others. A solution to this problem may 
be the multiple repetition of the smoothing procedure. 

4.3.4. Averaging 

Operations connected with averaging temporal signal values may be carried 
out both in the time and frequency domains. Averaging in the time domain 
may be divided into two groups: 

Smooth- out averaging^ which in most cases is performed by means of low- 
pass filtering. The goal of this operation is to remove short-term high fre- 
quency components that appear in the signal. This kind of averaging is often 
related to signal pre-processing. 

Synchronous averaging^ which is usually applied in the case of signals that 
are effects of cyclic phenomena (e.g., signals recorded during the operation of 
a gear, a cam mechanism or a rotating machine). Synchronous averaging may 
help in the identification of repeating characteristic sequences of changes of 
signal amplitudes. 

Synchronous averaging is a method of signal analysis that requires not 
only the observation of the averaged signal but also the parallel recording of 
the synchronizing signal that provides information about the beginning or 
a given phase of consecutive cycles of the analysed object operation. As an 
example of such a signal one can give a signal containing impulses that are 
generated by the key-phasor probe. The impulses are results of the detection 
of some markers (on the shaft surface) that makes the detection of an angular 
location of the rotating shaft possible. The interval between two consecutive 
impulses of this signal is the basic period of the analysed cycle of the object’s 
operation. The synchronous averaging consists in estimating a mean diagram 
of the signal within a period corresponding to a given integer number of the 
basic periods (Fig. 4.7). That kind of averaging can be also carried out in 
the frequency domain, an example being the order tracking analysis (Gade 
et a/., 1995). 

The main goal of synchronous averaging is to remove non-synchronous 
components (including random noise). It should be stressed that such a 
method should be exclusively applied when there is a reason to assume that 
the noise which is going to be removed is uncorrelated with the synchronising 
signal. Synchronous averaging can be also interpreted as tracking band-pass 
filtering, while the characteristics of the filter bank depend on the actual 
frequency of the signal (Moczulski, 1984). The method is often applied to 
signals such as two-channel ones, which makes it possible to determine the 
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Fig. 4.7. Example of synchronous averaging: an averaged signal (a), 
a synchronising signal (b) and a result of the averaging (c) 



temporal location of the centre of the journal in a slide bearing in rotating 
machinery. These signals enable us to estimate the mean orbit of the centre 
of the journal and to remove noise that are caused by the heterogeneity of 
the journal surface. The noise is the reason for the disruption (interference) 
of the operation of eddy-current probes. 



4.3.5. Principal component analysis 

The accuracy of the analysis of time sequence changes can be significantly im- 
proved by applying methods that make it possible to remove irrelevant signal 
components. Principal Component Analysis (PCA) was originated by Pear- 
son (1901) and developed by Hotelling (1933). It is a well-known procedure 
that transforms the number of correlated variables into a smaller number of 
uncorrelated ones, called principal components. It is defined as the following 
linear transform (Jolliffe, 1986): 



y = Wu, (4.23) 

where is a stationary stochastic process, y is an M-dimensional vector of a 
process that is reduced by an (M x A") -dimensional matrix W, and M N. 
As a result of that transformation the output space y, whose size is reduced, 
should include the most important information contained in the input process 
u. Otherwise, the PCA transformation makes it possible to convert a large 
quantity of information, which is included in mutually correlated input data, 
into a set of statistically independent components ordered according to their 
importance. The transformation is thus a form of compression loss, which is 
known in communication theory as pattern processing (Jolliffe, 1986). 
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The transformation matrix W may be obtained by estimating eigenval- 
ues and eigenvectors Wi, i = 1,2 . . . , N of square symmetric autocorre- 
lation or autocovariance matrices Ruu = E[uu'^] determined for the input 
process u. The following equation holds: 

RuuWi - XiWi, (4.24) 

for i = Ordering the eigenvalues in the descending order 

of Ai > A 2 > • • • > Aat > 0 and taking into account only M first elements 
makes it possible to define the matrix W — [u;i, it; 2 , • • • , wm]'^ , which defines 
the PCA transformation in the form (4.23). 

An inverse process that consists in reproducing the information u on the 
basis of the vector y can be defined as follows: 

u = RuuW'^{WR^uW^y\. (4.25) 

The minimum of the expectation value of a square function of an error 
E[u — is possible to be obtained in the situation when the rows of 
the matrix W consist of the first M eigenvectors of the matrix Ruu- 
Then WW^ = 1, u = w'^y and Jl„„ = W^QW, but Q = 
diag [Ai, A 2 , . . . , Am], and the correlation matrix Ryy is equal to the ma- 
trix Ryy = WRuuW^ — Q- The diagonal form of the matrix Ryy entails 
that all elements of the vector y are uncorrelated with the variance equal to 
the eigenvalues A^. 

From the statistical point of view, the PCA transformation defined in such 
a way determines the set of M orthogonal vectors, which mostly contribute 
to the variance of input data. The principal component, which is related to 
the highest eigenvalue (eigenvalues of the autocovariance matrix are non- 
negative) , represents the direction in the data space in which the data reveal 
the highest value of the variance. The variance is determined by the eigenvalue 
related to the first principal component. The second principal component 
describes the next orthogonal direction in the space that is characterised by 
the highest data variance. 

Usually, only some highest principal components are responsible for a sig- 
nificant part of variance. Data projected onto the next principal components 
are often characterised by an amplitude which is not higher than the mea- 
sured noise. They may be removed devoid of any loss, which enables us to 
reduce the data set without any significant loss of information. 

The PCA method can be also used as a filter that attenuates noise in 
measurement signals. The first M principal components are considered and 
the others, of a higher order are rejected. The further process of reproduc- 
ing the signal u is performed only on the basis of the first M principal 
components (4.25). 
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4.4. Non-parametric methods of signal feature estimation 

4.4.1. Scalar feature estimation 

Signal estimation consists in determining the value of a given feature. When 
considering values one may distinguish two groups of features: a scalar fea- 
ture group (the value of a feature is a single number) and a function feature 
group (the value of a feature is a function of either time or frequency) . Since 
modern signal analysis is carried out using digital computers, continuous val- 
ues of a functional feature have to be digitised yielding sequences of samples 
that may be represented by one-column matrices (i.e., vectors). Therefore, 
such features will be further referred to as vector features. Scalar features are 
broadly applied in technical diagnostics. Mathematical expressions concern- 
ing the most important scalar features of signals are included in Table 4.1. 
These selected features have been defined for deterministic signals. The char- 
acteristic values of these features are given for the harmonic signal of the 
amplitude X. 



Table 4.1. Selected signal scalar features 



Feature name 


Feature definition 


Value for 

the harmonic signal 


Mean value 


1 /■*+^ 
f Jt 


^ = 0 


Absolute mean value 


1 

XAVE — ^ J k('^)l 


SAVE = -X S 0,603X 

7T 


Root mean square value 


^RMS ~ J f J X^(u) du 


xrms = —7= — 0, 707X 
v2 


Absolute peak value 


a:pEAK = max |a:('u)| 

t<u<t+T' ' 


a:pEAK = X 


Positive peak value 


a:pEAK+ = max x(u) 

t<u<t+T 


XPEAK+ — X 


Negative peak value 


a:pEAK- = min x(u) 

t<u<t-\-T 


a^PEAK- = —X 


Peak to peak value 


Xp-p = XPEAK+ — a;pEAK- 


1 

II 

to 


Form factor 


_ ^RMS 
XAVE 




Crest factor 


^ _ a:pEAK 

xkms 


C = \/2 S 1,414 
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The features included in Table 4.1 can be also applied to the analysis of 
ergodic signals. However, the analysis of signals other than harmonic ones 
requires the application of an operator of the expectation value E{-}. 

The estimate of the mean value that is defined by (4.3) is the mean value 
of the random signal determined at a given time moment t. The value is 
estimated as the expectation value of the random variable {xa{t) | a G T}. 

4.4.2. Spectral analysis 

The frequency domain is fundamental in diagnostic signal representation. 
The function feature of a signal estimated in this domain is the power spec- 
tral density. Methods of its estimation for the periodic signal are based on 
the Fourier theory, which assumes that each signal which can be physically 
realised can be considered as a linear combination of harmonic functions. One 
can distinguish four types of the Fourier transform. They are presented in 
Table 4.2. The differences between them are the way the analysed signal is 
described and the form of the results of the transformation (continuous or 
discrete one), called signal spectra. 



Table 4.2. Types of Fourier transforms 



Type 


Transform 


Inverse transform 


Continuous integral 
transform: 

- continuous and 
infinite signal, 

- continuous and 
infinite spectrum, 


+ 00 

X{f) = J x{t) exp{-j27Tft) dt 
— oo 

-OO < / < +00 


+ 00 

x{t) = J^if) exp(j2nft) df 

— OO 
+ 00 

x(t) = J X(f)exp{j2TTft)df 

— OO 


Sampled functions: 

- discrete signal, 

- continuous and 
periodic spectrum. 


oo 

exp{-j27TiAf)At 

t= — oo 

— — < / < — 

2At - - 2At 


ll2At 

^i = J X{f) exp{j27TfiAt) df 

-ll2At 

— OO < Z < OO 


Fourier series: 

- periodic signal, 

- discrete spectrum, 


T 

Xk — J x{t) exp(—j27TtkAf) dt 
— oo < k < oo 


oo 

exp{j27rkAft)Af 

k = — oo 

0<t<T 


Discrete Fourier 
Transform (DFT): 

- discrete and per- 
iodic signal, 

- discrete and per- 
iodic spectrum. 


( .27tzA/\ 

2 jAt\ 

i=o 

0<k< N -1 


Xi-'^Xk exp (j ) A/ 

A;=0 

0 < z < AT - 1 
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The analysis of digital signals consists in observing discrete signal values 
in the finite time period of signal duration. That way of signal observation can 
be interpreted as the multiplication of a rectangle function (called a window 
function) by signal values considered as the infinite time of duration. Function 
values are equal to zero outside the analysed range of signal observation. 
The weighting function defined in such a way is called a window function 
in the time domain (Bendat and Piersol, 1971). The Fourier transform of 
the window function is called the window function in the frequency domain. 
Apart from the rectangle window, numerous types of these functions were 
introduced (Harris, 1978). They are usually applied depending on the signal 
class and the goal of its analysis. The application of the window function in 
the time domain consists in the multiplication of the signal by a given window 
function, which corresponds to the convolution of the signal spectrum and 
the spectrum of this window. 

Observing discrete signals in finite time periods and estimating the signal 
spectrum are always connected with the occurrence of several effects that can 
be interpreted as results of the following operations: 

The sampling and quantization of a continuous signal The spectrum of 
such a signal is also continuous and non-periodic. Sampling in the time 
domain consists in the use of sampling distribution. The spectrum of the 
distribution is continuous and periodic. Sampling in the frequency domain 
consists in the convolution of the signal spectrum and sampling distribu- 
tion spectrum, which introduces periodicity characterised by a period equal 
to the sampling frequency. It is the main reason of the occurence of the 
aliasing effect. 

Observing a signal in the finite time period (— T/2,T/2) consists in the 
application of the rectangle window, whose transform is a function of the 
(sinx)/x type. As in the previous case, this operation is represented by the 
convolution of a discrete signal spectrum and window function spectrum in 
the frequency domain. It is often the reason why there appear ripples called 
sidebands. The height of sidebands and side-lobe fall-off are dependent on 
the window type. It should be stressed that the finite time period of signal 
observation always introduces sidebands, which are the reason for the power 
leak effect (Randall, 1987). 

Spectrum sampling corresponds to the multiplication of signal spectrum 
and the spectrum of sampling distribution in the frequency domain. The in- 
tervals between the impulses are defined by A/ — IjT and are called spec- 
trum resolution. In the frequency domain, spectrum sampling corresponds to 
the convolution of the signal and sampling distribution characterised by a 
period T. That operation is connected with the third effect of signal analy- 
sis that is called the picket fence effect. The phenomenon consists in signal 
power distribution between several consecutive bands. The reason for this 
phenomenon is the fact that the impulses of sampling distribution are char- 
acterised by different values of frequency in comparison to the signal com- 
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ponents. In order to remove this effect the application of operations called 
spectrum corrections is needed (Randall, 1987). 

Spectrum deformations in the sideband form that are caused by the fi- 
nite time period of observation can be partly removed by the application of 
windows other than the rectangle one (Lyons, 1997; Oppenheim and Schafer, 
1975). Among the windows most often applied one can enumerate such func- 
tions as rectangle, Hanning, Kaiser-Bessel and Flat Top (Gade et a/., 1995) 
ones. The characteristic properties of each window are the width of the main 
spectral line described by the factors of A/, the damping of the first side- 
band expressed in [dB] and the intensity of sideband decay expressed in [dB] 
within one decade. In Fig. 4.8 there are shown frequency characteristics of 
the rectangle and Hanning windows. 




Fig. 4.8. Comparison of frequency characteristics of window functions. 
The continuous line - the rectangle window, the dashed line - the Hanning 
function with the length of the support equal to 1 [s] 



The width of the main lobe for the rectangle function is equal to 2A/. For 
the Hanning function it is greater and equal to 4A/, which means that the 
Hanning function is characterised by worse spectral resolution. The damping 
of the first sideband for the rectangle function is equal to 13 [dB] and for the 
Hanning function it is equal to 32 [dB]. The intensity of sideband decay in 
the rectangle window case is equal to 20 [dB] within one decade, and for the 
Hanning function it is significantly greater and is equal to 60 [dB]. Summing 
up, the Hanning window is characterised by better selectivity in comparison 
to the rectangle window. The application of the window function can be 
interpreted as the application of a band-pass filter. 
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The results of the Fourier transform can be presented in numerous ways. 
One can distinguish the following kinds of spectra (Broch, 1984; Morel, 1992): 

• amplitude spectrum (an ordinate is the absolute value of the Fourier 
transform) , 

• phase spectrum (an ordinate is the phase of the Fourier transform), 

• energy spectral density (an ordinate is the square value of the absolute 
value of the Fourier transform) , 

• power spectral density (an ordinate is the ratio of the square value 
of the absolute value of the Fourier transform to the length of the signal 
support). 



Power spectrum density can be also estimated as a feature calculated for two- 
channel signals, i.e., recorded at the same time. The result is called power- 
cross spectrum density. 

Since in most cases the analysed values of temporal signals are odd func- 
tions of time, the application of the Fourier transform leads to the estimation 
of a signal spectrum of complex values in the form of a double-sided spectrum. 
It is usually defined as S{f). Because of physical difficulties, with spectrum 
interpretation, usually a one-side spectrum G{f) is estimated. The spectrum 
is defined as 



G{f) = 



25(/), />0, 

0 , / < 0 . 



(4.26) 



The complex power spectrum of a signal is described by the following 
expression: 

G{f) = \G{f) I exp ( - jV(/)) , (4.27) 

which enables us to estimate an amplitude and initial phase of a signal on 
the basis of the following formulae: 

• the amplitude of a signal is determined as an absolute value of the 
Fourier transform: 



|G(/)| = v'Re[G(/)]2+Im[G(/)]2, (4.28) 

where Re [G{f)] is the real part, and Im [G{f)] is the imaginary part of the 
Fourier transform G(/), 

• the initial phase is defined as follows: 

/ Im[C?(/)] \ 

( Re[G(/)])- 



If if) = Arctg 



(4.29) 
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The estimation of the power spectral density of non-stationary signals re- 
quires some more assumptions that are related to the varying structure of 
these signals. One of the most common methods of such signal analysis is the 
Short Time Fourier Transform (STFT) (Kumar et a/., 1992): 

/ oo 

x{t)w{T - t) exp(-2j7r/i) di, (4.30) 

-OO 

where x{t) is the analysed signal, w{r — t) is a window function, and r is 
the parameter of a time shift connected with the shift of the window function 
along the frequency axis. Spectrum estimation with the use of the enumerated 
four types of the Fourier transform can be conducted in two ways: 

• with the use of the Fourier transform applied to the autocorrelation 
function in the time domain, 

• with the use of the Fourier transform applied to the signal in the time 
domain; the signal may be initially averaged. 

The third method of spectrum estimation is the application of analogue fil- 
tering that can be carried out by means of the application of a filter with a 
switched or swept centre frequency of the pass band. 

4.4.3. Higher order spectral analysis 

The autocorrelation function of stationary real-valued signals which is de- 
fined with the use of the expectation value operator E{‘] (4.4) has found 
numerous applications. The most important of them are: the identification 
of propagation paths of signals and the detection of the cyclic influence their 
of sources. The results of the generalisation of the function (4.4) are high- 
er order moments and their nonlinear combinations called cumulants. The 
cumulant of the n-th order is defined as a difference between the values of 
the n-th moment of a signal x(t) and the n-th moment of a signal that is 
equivalent to the stationary signal of the normal distribution. On the basis 
of this definition one can conclude that the cumulant takes a zero value for 
signals that are characterised by the normal distribution. 

For a stationary real- valued signal x{t) whose mean value equals zero, 
E{x{t)} — 0, the first, second, third and fourth order cumulants are given by 
the following formulae (Mendel, 1991; Nikias and Mendel, 1993): 

Ci^ = E{x{t)} - 0, (4.31) 

C 2 x{k) = E{x{t)x{t -h k)}, (4.32) 

CsxikJ) = E{x{t)x{t -f- k)x{t -h 1)}, (4.33) 

C 4 x{k,l,m) =E{x{t)x{t -f k)x{t -h l)x{t -f m)} - C 2 x{k)C 2 x(l ~ m) 

(4.34) 

- C 2 x{l)C 2 x{k -m) - C 2 x{rn)C 2 x{k - 1). 
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The first order cumulant is equal to the expectation value of the signal. The 
second order cumulant is called the covariance. Mathematical expressions of 
cumulants for zero time shifts define the following scalar features of signals: 

• the variance: 

= C'2x(0), (4.35) 

• the normalised skewness ratio called also the normalised asymmetry 

ratio: 

E{x\t)}/al=C3.i0,0)/al (4.36) 

• the normalized kurtosis ratio: 

E{x*{t)}/ai = C'4x(0,0,0)/cr^. (4.37) 



The Fourier transform of the autocorrelation function (4.4) enables us to 
estimate power spectrum density in the following form: 

+ 00 

Sxxif) = Rxx{k)exp{-j2wfk). (4.38) 

k=—oo 

Similarly, the application of the Fourier transform to cumulants makes it 
possible to estimate higher order spectra (Mendel, 1991; Nikias and Mendel, 
1993): 

• the second order power spectrum: 

-f-OO 

S2x(f) = ^ C 2 x{k)exp{-j 2 nfk), ( 4 . 39 ) 

k — oo 

• the third order power spectrum (called also the bispectrum), with two 
independent frequencies: 

+ 00 +00 

5 ' 3 *(/ i ,/ 2 ) = ^ C3x{k,l)exp{-j2TT{fik + f2l)), ( 4 . 40 ) 

k= — oo l=—oo 

• the fourth order power spectrum (called also the trispectrum), with 
three independent frequencies: 

S^xifi, f2, fs) 

+ 00 +CX) +00 //I \ 

{k,l,m)exp{-j2'K{fik+f2l+f3m)). 

k=—oo l=z—oo m= — oo 
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Other interesting applications of higher order spectra are also known 
(Radkowski, 1996). The enumerated features of higher orders can be applied 
not only to a single signal but also to several signals at the same time. This 
method has been used in multi-channel signal analysis. A multi-channel signal 
is defined as a signal that is represented by means of a matrix whose row ele- 
ments are values of channel signals recorded at the same time with the same 
parameters of recording (e.g., sampling frequency, the length of the signal). 
The application of cumulants and their power spectra is particularly useful 
for the analysis of signals that are characterised by the distribution of their 
temporary values, which differ from signals with the normal distribution. 

4.4.4. Analysis with the use of the wavelet transform 

As was mentioned in the previous sections of this chapter, the analysis of 
non-stationary signals requires the application of a different model in com- 
parison to methods of stationary signals. This consists mostly in applying 
such signal transformations that lead to two-dimensional representations of 
signals. An example of such a transformation is the short-time Fourier trans- 
form, defined both in the time and frequency domains. According to (4.30), 
the algorithm of the STFT may be interpreted as two operations: the shift 
of a window in the time and frequency domains along the signal being anal- 
ysed and the estimation of the spectra of this signal. Consecutive spectra 
are determined by the window placement. The spectra ordered in time are 
called the time- frequency characteristics, usually presented in the form of a 
waterfall diagram. An example of this diagram is shown in Fig. 4.9. 

The application of analysis based on STFT gives good results in the case 
of stationary signal analysis or the analysis of such non-stationary signals 
that are a linear combination of only narrow band components or exclusive- 
ly wide band components (Dalpiaz and Rivola, 1995; Kumar et al, 1992; 




Fig. 4.9. Waterfall plot 
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M^czak et a/., 1996). However, most physical processes taking place during 
a technical object’s operation are characterised by the appearance of high 
frequency components, which are effects of short-term phenomena, and, at 
the same time, low frequency components, which usually correspond to long- 
term phenomena (Dalpiaz and Rivola, 1995). In the case of such signals the 
method based on the STFT reveals disadvantages that are first of all caused 
by the fixed width of the window. Another way is wavelet analysis, which 
consists in applying the Wavelet Transform (WT) (Cohen and Kovacevic, 
1996; Daubechies, 1992). 

The wavelet analysis, similarly to the Fourier method, consists in signal 
decomposition and leads to signal representation in the form of a linear com- 
bination of given base functions called wavelets (Daubechies, 1992). A partic- 
ularly important property of the analysis is multi-stage signal decomposition 
and varying time-frequency resolution. Moreover, it is also possible to apply 
base functions other than harmonic ones. The wavelet analysis may be inter- 
preted as the application of a band-pass filter set with the constant relative 
width of frequency bands (Vetterli and Herley, 1992). The application of the 
wavelet analysis, analogically to the Fourier method, gives as a result two 
dimensional signal representation (Daubechies, 1992). 

The base functions of the wavelet transform during signal analysis are put 
into two operations: time shifting and changing the function support length 
by means of the use of different values of a scale parameter. These operations 
are defined as follows: 

= Ip 

where t is the time, r is the time shift, and sj is the scale parameter. 

According to the Heisenberg Uncertainty Principle it is assumed that 
the product of the base function parameters (the scale s and time shift r) 
is constant and may be interpreted as an area of the given data window. 
Proper changes of these parameters make it possible to obtain varying time- 
frequency resolution. The base functions of the wavelet analysis differ from 
the base functions applied in the Fourier method. The differences lie not only 
in the shape but also in the length of the function support (the range within 
which the function values are not equal to zero). In the Fourier analysis the 
length of the support is infinite, whereas the base function of the wavelet 
analysis is characterised by a finite and, in most cases, short length of the 
support. Moreover, in the wavelet analysis the function shape may be chosen 
in a way that makes it possible to satisfy some requirements dealing with the 
possibility of expanding the function into the series form. The comparison 
of the base functions in the wavelet and the Fourier analyses is shown in 
Fig. 4.10. 

Characteristics obtained with the use of the wavelet analysis are called 
scalograms or time-scale characteristics. They are usually presented in the 
form of waterfall or contour diagrams. In this case, the time axis corre- 
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Fig. 4.10. Comparison of the results of the STFT and WT analyses 
(Cohen and Kovacevic, 1996) 



spends to the time shift of the base function. The shift expressed in time 
units may be interpreted as consecutive time moments related to the consec- 
utive values of the signal, which is similar to the application of the STFT 
analysis. The second axis contains scale values that may be interpreted as 
an inversion of the frequency. The following relationship holds between the 
scale and the shift: 

s = s^, r = ns^To, (4.43) 

where sq is the basic scale, tq is the basic shift, and m and n are integer 
numbers. 

Depending on the type of the wavelet transform (e.g., continuous, CWT; 
discrete, DWT), these parameters take specific values. A particular case 
of scale parameters are integer values equal to consecutive powers of two 
(Daubechies, 1992), which leads to a decomposition of the signal in the oc- 
tave bands of the frequency. 

The continuous wavelet transform can be defined by the following formula 
(Daubechies, 1992): 

CWT = (^-^^x{t)dt, (4.44) 

where s is the scale parametr, r is a value of the shift of the base function, 
t is the time, ^p{•) is a base function, and x{t) is the analysed signal. The 
continuous transform is characterised by such values of the parameters s 
and r that belong to the real number set. In this case the wavelet coefficients 
are interpreted as a measure of the similarity of the signal (within a given 
window) to the support of the base function. 

The most important issues for application are the types of non- continuous 
transforms. One can distinguish a discrete form of the CWT and a discrete 
wavelet transform. The results of these forms of wavelet transforms are coef- 
ficients of the wavelet series. It should be stressed that the discrete form of 
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the CWT is completely different than the indiscrete one. The results of the 
application of the CWT are measures of the similarity of the base function to 
the signal, whereas the application of the DWT is analogous to filtering with 
a constant relative width of the frequency band. The DWT application re- 
quires defining the base functions taking into account filters that correspond 
to these functions. Nowadays, methods connected with the application of sets 
of orthogonal filter banks are being developed. 

The quality of the obtained results, and particularly the possibility of 
simultaneous identification of components that are results of phenomena of 
different characters, are connected with a proper choice of the base function. 
There is no general rule of that choice. However, in (Timofiejczuk, 2001) it was 
shown that the wavelet analysis makes it possible to obtain interesting results. 
What is particularly interesting is the application of such base functions that 
can model changes of selected features of the observed signal or phenomenon. 
For instance, impulse phenomena can be correctly identified with the use 
of a step base function, whereas base functions that are products of the 
Gaussian and harmonic functions give good results in the identification of 
resonance phenomena. An example of a function that made it possible to 
obtain the best results in the observation of rotating machinery was the 
Morlet function (Fig. 4.11). This function enables us to identify phenomena 
of different characters reflected simultaneously in the analysed signal. 




Fig. 4.11. Morlet base function 



Research dealing with rotating machinery that operated in varying con- 
ditions (Timofiejczuk, 1997; 2001), which consisted in the analysis of non- 
stationary signals, proved that the synchronisation of the analysis parame- 
ters with the parameters of machinery operations enables us to increase the 
possibility of the identification of individual signal components. The most in- 
teresting fact in that case was the synchronisation of the analysis parameters 
with the values of the characteristic frequency that resulted from the rotating 
speed (frequency) of the machinery shaft. 
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4.4.5. Analysis with the use of the Wigner-Ville transform 

The Wigner-Ville transform was first applied as a kind of the analysis of im- 
pulse responses (Gade and Gram-Hansen, 1996; Kumar et a/., 1992; M^czak 
et aL, 1996). The characteristic property of this transform is the lack of lim- 
itations concerning the resolution both in the frequency and time domains. 
The Wigner-Ville transform is a general form of a transform that does not 
require the determination of a base function. The formula of the transform 
is as follows (Gade and Gram-Hansen, 1996): 

Ws{t, f) = J ^ dt, (4.45) 

where x{t) is the analysed signal, and x*{t) is the signal conjugated with x{t). 

The transform (4.45) may be interpreted as a combination of the Fourier 
transform and the autocorrelation function (Mqczak et a/., 1996). A result of 
that combination is a spectrum considered as a time function or the autocor- 
relation function interpreted as a frequency function. Apart from that, the 
Wigner-Ville transform is very often interpreted as a two-dimensional mu- 
tual correlation and may be treated as time-varying spectral density, which 
is estimated as a result of the STFT. It should be stressed that in contrast 
to Fourier methods, the Wigner-Ville analysis does not consist in the signal 
representation of a linear combination of given base functions. 

The described results of the application of the transform are often diffi- 
cult to interpret. The lack of resolution limitations (in comparison to other 
transforms) is achieved at the expense of the clearance of time-frequency 
characteristics (Gade and Gram-Hansen, 1996). It should be stressed that 
the application of the discrete version of the transform requires the strength- 
ening of the commonly known sampling theorem. In order to remove the 
aliasing effect it is necessary to assume that the sampling frequency is four 
times greater than the maximum frequency of the signal. This requirement 
results from the estimation of a product of pairs of signal values (4.45). 

As advantages of that transform one can mention the great possibility of 
differentiating between signals whose amplitude or phase change. The trans- 
form gives good results in the estimation of signals that contain components 
whose length of supports (in the time domain) is comparable. The analysis 
of signals that are effects of phenomena whose duration significantly differs 
gives worse results (Gade and Gram-Hansen, 1996). 



4.5. Parametric methods of signal estimation 

The general methods of signal analysis, which consist in the estimation of 
non-parametric signal features, were described in the previous section. These 
methods do not require particular assumptions about the signal form. Their 
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disadvantage is often a large size of sets containing estimated values of fea- 
tures. Another group of methods of signal analysis are parametric methods. 
They came into being as methods of modelling systems, in most cases linear 
ones. In particular, the results of parametric methods are represented by only 
several parameters, fewer in comparison with the results of non-par ametric 
methods. Unfortunately, the models being discussed require an exact defi- 
nition of their structure. Wrong decisions about the choice of the structure 
of the model at the first stage of model identification often result in a poor 
quality of the estimated model. 

Restricting consideration of models to linear ones and taking into account 
the fact that the most useful concept is to consider a model with a single input 
and a single output enables us to define the general discrete model as follows: 

A{q)y{t) = §^x{t - nu) + (4.46) 

where q is a, delay operator, x is an input of the system, y is an output 
signal of the system, e is an interferring signal in the form of white noise, 
with a mean value equal to zero and a constant root mean square value, and 
A, J 5 , C, D, F are polynomials of the delay operators q~^: 

A{q) = 1 + aiq~^ + 029 “^ H 1 - an,q~^“ , 

B{q) = 60 + hiq~^ + 62?“^ H h 

C{q) = 1 + Ciq~^ + C 2 q~^ 4 1 - Cn,q~"'% (4.47) 

D{q) = l-\-diq ^ d,2q ' A dn^q 

F(q) = l + fiq-^+ / 2 Q -2 + • • • + 
such as, for example, 

A{q)y{t) = y{t) + aiy{t - 1) + • ■ • + an,y{t - Ua). (4.48) 

The structure of the model considered is determined by the number of 
the polynomial coefficients (4.47). Different versions of the general model 
(4.48) can be used. They are applied depending on the goal of research and 
accessible information about the observed object that is the source of the 
analysed signal. An interesting formal operation is to make the assumption 
that the analysed object has no input, which leads to the autoregressive model 
of the signal: 

A{q)y{t) = e{t), (4.49) 

or to the autoregressive model with the moving average ARM A: 



A{q)y{t) = C{q)e{t). 



(4.50) 




4. Methods of signal analysis 



147 



The values of coefficients of the corresponding operator polynomials A{q)^ 
C{q) are interpreted as a set of values of features of the signal described with 
the use of the models (4.49) and (4.50). The models (4.49) and (4.50) may 
be interpreted as linear filters, which enable us to obtain an estimated signal 
as a result of noise filtering. Signal features are assumed to be parameters 
which define the characteristics of these filters. 

The description of problems connected with the identification of paramet- 
ric models of signals can be found in numerous publications, e.g., (Box and 
Jenkins 1976; Sdderstrom and Stoica, 1994). 



4.6. Signal features estimated with respect 
to object properties 

The estimation of signals with time dependent features requires special meth- 
ods of their analysis. A bibliography review reveals that there are no univer- 
sal methods of non-stationary signal estimation. Suggestions discussing the 
application of a particular method are in most cases connected with the 
identification of the kind of non-stationarity and the goal of the analysis. 
This entails that such signal analysis requires an assumption about their 
models. 

An example of a particular class of non-stationary signals is the one 
recorded during operation of rotating machinery in varying conditions, e.g., 
during run-up or run-down. They are always characterised by a specific struc- 
ture. The main goal of such an analysis of signals (e.g., vibration) is the 
identification of changes of the technical state of the machinery. Taking in- 
to account the goals of machinery diagnostics, the most interesting issues 
are observations of objects during operation in varying conditions caused 
by a varying rotating frequency of the shaft, which is considered the char- 
acteristic frequency (Cholewa, 1976; Moczulski, 1984; Timofiejczuk, 2001). 
In this case, special requirements of signal analysis are needed. They are 
particularly related to the need of the identification and distinction of symp- 
toms of phenomena that occur during the operation of the object in varying 
conditions. They are divided into two groups. One can distinguish a group 
of signal features (state symptoms) that are related to the varying condi- 
tions of the object operation (e.g., connected to changes of the rotating 
speed) and a group of signal features which are connected to such values 
of frequency that are characteristic for the observed object (e.g., resonance 
frequency). It is assumed (Cholewa, 1976) that it is possible to consider 
the first group (called the representative one) as independent of the res- 
onance properties of the observed machine, whereas the second group of 
features (called the resonance one), reflecting the resonance properties of 
the examined object, may be considered as independent of the operating 
conditions. 
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Signals recorded during the operation of the machine in varying conditions 
often contain simultaneously wide- and narrow-band components that are ef- 
fects of phenomena whose duration is different. The identification of these 
components requires an analysis that leads to a two-dimensional representa- 
tion of a signal, which makes it possible not only to identify the individual 
components of the signal but also to recognise their variability both in the 
time and frequency domains. There are several ways of performing such signal 
analyses. The application of two methods deserves particular attention. They 
are analyses based on the short time Fourier transform (Moczulski, 1984) and 
on the wavelet transform (Timofiejczuk, 2001). Both of these methods lead 
to two dimensional signal representation and make it possible to identify 
signal components and their variability. However, their application does not 
enable us to divide the identified symptoms into the two groups mentioned 
above, which is often the reason for a misinterpretation of the results of the 
time-frequency analysis. 

The distinction between symptoms identified during time-frequency sig- 
nal analysis may be conducted by means of the RSL method (Cholewa, 1974; 
1976). The main assumption of the RSL method is that the analysed signal 
is an effect of periodic excitations and resonance properties of the machinery. 
In this approach the identified characteristics are considered as absolute fre- 
quency functions (resonance components of a signal) and relative frequency 
functions that are related to the exciting frequency (representative compo- 
nents of a signal). 

The scheme of the analysis of time-frequency characteristics with the use 
of the RSL method is shown in Fig. 4.12. The horizontal axis is described 
by values of the centre frequencies fj of bands that are estimated with a 
constant relative width. The vertical axis is related to the time of observation 
and in the figure was described by the rotating frequency /^. 




fj/f^=idem -> R. 



f^=idem -> W 



logfj 



Fig. 4.12. Scheme of time-frequency characteristics 
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The set of features W results from a characteristics cross-section done on 
the assumption = idem. The RSL method makes it possible to identify: 

• symptoms that are affects of resonance excitations, which are repre- 
sented by the resonance components R of a signal. They are estimated as a 
cross-section calculated for fj — idem, 

• symptoms that are results of periodic excitations, defined by the rep- 
resentative components 5 of a signal. They are estimated as a section cal- 
culated for fj/fn — idem. 

RSL decomposition is based on the assumption that time-frequency char- 
acteristics are estimated within bands characterised by the constant relative 
width of frequency bands, and the values of characteristic parameters (char- 
acteristic frequency) for each set W are equal to the values of the centre 
frequencies of consecutive bands or to the consecutive values of the scale 
coefficients. These assumptions are shown in Fig. 4.13, which illustrates the 
distinction between the symptoms. This makes it possible to consider a signal 
as a composition of two signals, a resonance one and a representative one, 
respectively. The decomposition may be applied to time-frequency charac- 



0 ) 

E 






Fig. 4.13. Scheme of the RSL decomposition 



teristics that are either the results of the STFT analysis (or an equivalent 
one) (Cholewa, 1974) or the results of the WT analysis (Timofiejczuk, 2001). 
In both cases the general idea of symptom separation is the same. The only 
difference is the way the feature assigned by IF is treated. In the Fouri- 
er analysis each consecutive IF is a signal spectrum that determines the 
power distribution of the signal in consecutive frequency bands characterised 
by a constant relative width. The spectrum is defined as sequences of levels 
(bands) characterised by the central frequencies fp-rfk- 

W = {w{j); j = jp,...,jk}, 



(4.51) 
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where w{j) is a level of the signal power [dB] in the jth frequency band with 
a central frequency that is equal to fj [Hz]. 

In the wavelet analysis is the set of wavelet coefficients which de- 
termine the measure of the similarity of a given signal to the base function 
corresponding to a given frequency (or scale). The point of difference between 
these representations is the level interpretation. In the signal spectrum case 
the levels may be treated as a sum of the components S_ and R, whereas 
the wavelet coefficients estimated at given time moments and for given scale 
values are logarithms of wavelet coefficients (Timofiejczuk, 2001). Assuming 
the model of the substitutional source of a signal (Fig. 4.14) the spectrum of 
the signal can be considered as a sum of three independent spectra: 

W = R + S + L[dBl (4.52) 

where IF is a power spectrum of the analysed signal, 5 is a spectrum of the 
representative signal, £ is a spectrum of the resonance signal, and L is a 
spectrum of pink noise. 




Fig. 4.14. Model of the substitutional source of a signal (Cholewa, 1976) 



The RSL analysis ought to be applied with the assumption that the spec- 
tra ordered in time (run-up or run-down characteristics) are estimated with 
a constant relative width of frequency bands, and the consecutive analysed 
values of a rotating frequency correspond to the values of the central frequen- 
cies of the consecutive spectrum bands. On this assumption it is obvious that 
the representative signal is related to changes of the object operation and to 
phenomena taking place in the object. Invariability in the relative frequency 
scale related to the values of the excitation frequency fm is characteristic for 
this spectrum. Similarly, the resonance signal is independent of changes of 
machinery operating conditions. The spectrum of this signal contains features 
that carry information about the resonance properties of the object. 

The RSL method requires at least the estimation of two consecutive spec- 
tra W calculated for succesive values of characteristic frequencies. Apart 
from the above requirements, the algorithm of the method consists in solving 
the following system of equations: 

HVn = Sm R Lfji^ 



IF^+1 — Sm+l + + Lm+1, 



(4.53) 
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where Sm, Sm+u ^m+i are unknown. The spectra of the rep- 

resentative and resonance signals are represented by the sequences Sm = 
{sm{j), j = R = {r(j), 3 = In this equation 

Sm{j) is the value of the power level of the representative signal in the j-th 
band. These bands correspond to excitation frequency for an object whose 
characteristic frequency is included in the m-th frequency band. The sys- 
tem of equations (4.53) has no infinite number of solutions, and invariants 
of these solutions are the differential representative and resonance spectra 
(Cholewa, 1976). 



4.7. Summary 

In this chapter, selected methods of signal analysis, particularly those that are 
useful in diagnostic research, were presented. Non-par ametric and paramet- 
ric methods were enumerated and described. The lack of generally accepted 
recommendations that could be the basis for the selection of the presented 
methods in some given applications was stressed. A suggestion concerning 
the application of parametric methods in the case of the analysis of limited 
sets of data (e.g., time series of temporal values of a signal that contains only 
a small number of samples) was made. These methods are characterised by 
a limited number of estimated signal features. If needed, the signal features 
obtained in such a way may be transformed into features corresponding to 
the results of non-parametric analysis, which often ensures a quality superior 
to the one that is obtained with the direct use of non-parametric methods. 

Particular attention was paid to methods of signal analysis that take 
into consideration the specific properties and characteristics of the examined 
objects as well as conditions of their operation. These methods enable us to 
observe transient states, which are often rich sources of information about 
the examined object. 
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Chapter 5 



CONTROL THEORY METHODS IN 
DESIGNING DIAGNOSTIC SYSTEMS 

Zdzistaw KOWALCZUK*, Piotr SUCHOMSKP 



5.1. Introduction 

Methodologies of the technical diagnostics of dynamic processes, represented 
by the acronym FDI referring to Fault Detection and Isolation (Chen and 
Patton, 1999; Gertler, 1995; 1998; Isermann, 1984; Frank, 1990; Patton and 
Chen, 1993; Willsky, 1976), commonly concern three principal diagnostic op- 
erations (performed in parallel or in sequence) : detection (the discovery that 
something has gone wrong in the monitored/supervised process), isolation 
(the differentiation, separation, localisation of a fault, leakage, bias, etc., or 
the indication of a faulty element) and the identification of a fault (the de- 
termination of the value or magnitude of this error). 

While constructing suitable diagnostic systems we can employ a wealth 
of design methodologies, worked out within many developed branches of con- 
trol theory, such as the modelling and identification of dynamic systems, 
including state estimation. Thus, for instance, techniques based on consis- 
tency modelling (describing direct relations of the consistency, or parity, of 
measurement data), diagnostic state observers, and Kalman filters are fun- 
damental methods of designing generators of residues. Such approaches are 
within the scope of this chapter. 

Diagnostic decisions concerning the monitored dynamic process are usu- 
ally derived from the results of the fault detection procedure, which analyses 
residual vectors (residues), which are suitably generated on the basis of the 
errors of the estimates of plant output signals. The estimation is conditioned 
by a rationally chosen mathematical model. 
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Such a model of the supervised process - on many simplifying assumptions 
- can be expressed in the form of appropriate mathematical models, such as 
linear operator transfer functions or flow graphs with arcs or branches deter- 
mined by transfer functions (Gertler, 1998; Gertler et al, 1995; Kowalczuk 
and Gertler, 1992). 

A synthesis procedure founded on such models enables the designer to con- 
struct diagnostic (detective) generators of residual signals, which should also 
have suitable (structurally redundant) properties allowing - after a statistical 
treatment - the derivation of useful inferences of practical signiflcance. 

Thus a methodical design of a residue generator requires both an appro- 
priate speciflcation of the desired characteristics of the vector residue and 
fitting implementation of this generator. It is also recommended that the 
whole set of possible residual signals have distinctive features which facilitate 
the differentiation of (at least individual, singular) errors, representing faults, 
defects, or permanent failures. We can mention here (Gertler, 1998) such fea- 
tures as the direction (a singular fault invokes a residual vector of a fixed 
direction in the space of residual vectors), the structure (all residual signals 
of a given fault are restricted to a separate and fault-specific subspace of the 
space), or even orthogonality or diagonality (if the fixed directions are mu- 
tually orthogonal or are equivalent to separate one-dimensional subspaces). 

Static parity relations were applied in system diagnostics in the 1970s 
and the early 1980s, while dynamic relations were introduced by Chow and 
Willsky in 1984. Diagnostic observers, which appeared in the early 1970s, have 
evoked great interest among system engineers. Some essential contributions to 
this subject can be found, for instance, in the works of Massoumnia (1986), 
Viswanadham et al (1987), White and Speyer (1987), Prank (1991), and 
Patton and Chen (1991a). 

A particularly useful and universal method of constructing detection sys- 
tems, whose basic task is the generation of a vector residue, is thus based on 
the state-space description of the monitored object and on state estimation 
or observation system design (Chow and Willsky, 1984; Chen and Patton, 
1999; Kowalczuk and Suchomski, 1998). Such a system can be arranged in an 
efficient way, which means that - being insensitive to disturbance signals and 
measurement noises affecting the object as well as to the structural and para- 
metric uncertainty of the process model - the estimator has a clear ability 
to expose in the generated residual vectors symptoms of the faults occurring 
in the process (Kowalczuk and Suchomski, 1999; Magni and Mouyon, 1994; 
Patton and Chen, 1991b). 

Though the concept of observation systems was introduced by Luenberger 
(1966) several years after the publication of the solution to the optimal esti- 
mation task by Kalman (1960), it is the Kalman Alter (Athans, 1996) that 
is a stochastic version of the Luenberger observer. A specific proposition 
of detective Kalman Alters suggested by Mehra and Peschon (1971) found 
its continuation in many subsequent works (e.g., Mangoubi, 1998; Willsky, 
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1976; 1986). They definitely represent a stochastic system approach. On the 
other hand, within an integral deterministic approach, based on parity equa- 
tions and classical diagnostic observers (Chow and Willsky, 1984), it can be 
shown that analogous sets of residual vectors are attainable (Gertler, 1991; 
Viswanadham et al, 1987). 



5.2. Transfer function approach 

Some of the most classical tools for modelling dynamic systems are linear 
operator transfer functions (Brogan, 1991; Gertler, 1998) describing (in the 
domain of a respective complex operator s or z) the reaction of linear time- 
invariant systems with zero initial conditions. In this chapter we shall limit 
our deliberations to applications of simple whole models (Gertler, 1991; 1995), 
neglecting other practically useful structural models (Gertler et al, 1993; 
1995; Kowalczuk and Gertler, 1992), which employ partial transfer functions. 

The technique, generally known as the parity equation method, consists 
in designing suitable consistency or parity equations (relations) , on the basis 
of which a residual vector is generated, which ought to have a ‘zero’ value in 
correct, non-defective conditions. 



5.2.1. Residue generation 

A monitored multi-dimensional linear dynamic plant with a vector input 
u{k) E and a vector output y{k) E can be described in the discrete 
time domain by a nominal input-output model represented by a function of 
the shift operator q: 

y{k) - M{q)u{k), (5.1) 

where M{q) is a transfer function matrix of realisable rational functions 
rriij (q) defined for z = 1 . . . m, j = 1 . . . as 



rriij (q) 



hij{q-^) htj{qY 



(5.2) 



where gij{q~^) and hij{q~^) are mutually prime polynomials of the variable 
while gfj[q) and hf-{q) are the corresponding polynomials of the shift 
operator q. By introducing the least common multiple h[q~^) of the denom- 
inator monic polynomials hij{q~^) of the elementary Single-Input Single- 
Output (SISO) transfer functions (5.2) in the form 



h{q ^) — 1 -h hiq ^ 4~ • • • + hi,q ^ , 



(5.3) 
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the transfer function matrix can be specified as follows: 



M{q) = 



G{q-^) 

h{q-^) 



G+{q) 

h+{q) ’ 



where the polynomial matrices are defined as 



(5.4) 



G{q ^) — Go + Giq ^ + • • • + G^q 

(5.5) 

G^{q) = q^G{q-^), 

with suitably determined matrix coefficients Go , Gi , . . . G^/ , and the param- 
eter ly playing the role of the input- output order of the system (called also 
the apparent order^ Gertler, 1995). 



System errors, faults and disturbances. Additive errors, which essen- 
tially affect the functioning of the supervised process and which represent 
disorders invoked by certain faults, defects, failures, noises and disturbances, 
can be modelled as a reaction of some sub-systems of known transfer functions 
to unknown input signals. Thus, by forming a suitable input error vector sig- 
nal f{k) G the input-output relationship of (5.1) extended by the model 
error / can be expressed by an Output Error (OE) model (Ljung, 1987): 



y{k) = M{q)u{k) + S{q)f{k), 

where the matrix transfer function of the error 

_ _ N+{q) 



5(g) = 



A(g-i) h+{q) ’ 



(5.6) 



(5.7) 



and the matrices A^(^“^) and N~^{q) are polynomial matrices defined anal- 
ogously to (5.5). 

For simplicity, we shall keep the vector / as an integral error or a fault 
unit composed of all of the above-mentioned disordering signal sources. This 
issue will be discussed later on in this and the next chapters. At this point, 
it is worth emphasising that, in general, there is a clear - though principally 
subjective - distinction between faults and disturbances. Namely, faults are 
those unknown input error signals whose existence is pertinent for us (and 
we want to detect them), while disturbances are those unknown input error 
signals which are none of our concern (and we want to eliminate their effect). 

Assume that there are k strictly input errors fj {k) in the process model 
(Gertler, 1995; 1998), which have strictly proper transfer functions s^j{q), 
being a respective column of the matrix S{q). This means that the delay 
operator q~^ can be factored out from the vector polynomial n.j{q~^), being 
a respective column of the matrix N{q~^). 
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State-space perspectives. The transfer function model (5.6) can be ex- 
pressed in an n-dimensional state space: 



x{k -h 1) Ax{k) -f Buu{k) -h Ef{k), 
y{k) = Cx{k)-^Duu{k) + Ff{k), 



(5.8) 



where x{k) € E" denotes a running state vector and all the matrices have 
suitable dimensions: A € E"^”, , C £ E'”^”, £>„ £ 

E £ E”^’' and F £ By comparing (5.6) and (5.8), we have 



M(q) = C(qIn-A)-^Bu + Du, 
S{q) = C{qln-A)-^E + F. 



(5.9) 



The order n of the above minimal state-space realisation, which is equiv- 
alent to the order of a controllable and observable/reconstructable Kalman 
canonical sub-system (DeCarlo, 1989) of a hypothetical process under con- 
sideration, has a useful meaning of a pertinent True’ order n of the object. 
This order can, in general, be greater than the degree of the denominator 
polynomial (5.3): n > v. That may be possible because the model (5.8) 
can independently represent various sub-systems rriij{q), along with their 
repeated poles (or multiple eigenvalues), which have been left out of the 
polynomial (5.3). 

By describing the common factor avoided in the equations (5.3), (5.4) 
and (5.6) by means of a polynomial (^) degree (n — i/), the rela- 

tionships shown in (5.9) can be expressed as follows: 



h^{q)G+(q) = C'adj(g/„ - A)B„ + det(g/„ - A)I»„, 
ht{q)N+{q) = Cadj(g/„ - A)E + det(g/„ - A)F, 
hoiQ)h'^{Q) = det{qIn-A). 



(5.10) 



In an elementary instance of only simple poles in particular SISO sub- 
systems, the repeated poles can be easily detected by performing a partial 
fraction expansion of these sub-systems and a rank analysis of suitable matrix 
coefficients (Gilbert, 1963), whereas in the case of multiple poles in the sub- 
systems the Smith-McMillan factorisation (Kailath, 1980; McMillan, 1952) is 
used. 

Residual equations. In the analysed case the residue generator is a linear 
discrete-time system which processes samples of the input and output signals 
of the monitored object. For a given residual vector r{k) G , such a system 
can be represented in the following form (Gertler, 1998): 



r(k) = [ V(q) W{q) ] 



u{k) 

y{k) 



= R{q)u{k), 



(5.11) 
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where V{q) and W{q) are transfer functions submitted for design. Thus 
making use of the OE model (5.6) produces a new relationship: 

r(k) = [V{q) + Wiq)M{q)]u{k) + W{q)S{q)f{k). (5.12) 

In order to make the residue r(k) insensitive to the excitation u{k), 
keeping - at the same time - its sensitivity to the errors f{k), we have to 
assume that 

V{q) = -W{q)M{q). (5.13) 

This brings the equations (5.12) and (5.11) to two useful forms (Gertler, 1995; 
1998) of the residue generator. One is an internal form (an analytical formula 
conditioned by the errors): 



r{k) = W{q)S{q)f{k), (5.14) 

which shows how the errors influence the residue. And the other is an external 
(algorithmic or computational) form: 

r{k) = W{q) [y{k) - M{q)u{k )] , (5.15) 

which describes the method of computing the residual vectors. 

Note that the algorithm (5.15) can be interpreted as a result of a linear 
transformation {W) of the following primary parity relations for the fault- 
free model (5.1): 

e{k) = y{k) - M{q)u{k) G . (5.16) 

As the parametric matrix V{q) has been flxed by virtue of (5.13), for a 
practical implementation of the algorithm (5.15) we need to design suitably 
the matrix W (q) of the residue generator. 

It is worth stressing here that the formulae (5.14) and (5.15) give most 
general forms (Gertler, 1991) of linear generators of residual signals and the 
consistency relations (5.16), including diagnostic observers and Kalman Al- 
ters, discussed later on. 

Designing the generator reaction. A desired response of a single residue 
ri(k) G E to a chosen error fj(k) G E can be deflned by the following 
relationship: 

= Zij(Q)fj(k)- 

Hence, taking into account the internal residue description (5.14), we have 

Wi,{q)s,j{q) = Zijiq), 



and 

'Wi»iq)Si{q) = Zi,{q), (5.17) 

where Zi.{q) stands for an aggregate of all response patterns of the i-th 
residue in the form of a ki-element row vector, Wi.{q) is a corresponding 
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row vector of design parameters, and an (m x ki) matrix Si{q) consists of 
suitably chosen (or all) columns s.j{q) of the matrix transfer function S{q) 
of the errors /faults. 

The specification of the reaction of a single residue (5.17) is homogeneous 
if its model Zi.{q) is a zero row vector. Among other, non-homogeneous de- 
signs we can distinguish almost homogeneous specifications, which are char- 
acterised by a single non-zero co-ordinate of the vector Zi.{q). With an m- 
dimensional output vector y{k) G , the specification of a single residue via 
the model vector Zi 9 (q) of (5.17) cannot contain more than m coordinates, 
i.e., ki < m, of which at most (m— 1) can have a zero value. Furthermore, the 
specification of all the m coordinates of Zi.{q) is full, and settling (m — 1) 
elements is almost full (Gertler, 1995; 1998). 

A residual vector of all the w residues that is formed as the response of 
the residue generator to a chosen single error fj {k) can be described as 

r{k\fj)=z,j{q)fj{k). (5.18) 



On that account, all the vector specifications z.j{q) can be given in an ag- 
gregate framework as a transfer function matrix Z{q): 



Z{q)= z,i{q) ••• z,t,(g) ] 

Hence by virtue of (5.14) we have 



(5.19) 



W{q)S{q) - Z{q). (5.20) 



The structure of a (qualitative) generator reaction to a given j-th error 
can be determined in terms of a respective signal coincidence - via a quali- 
tative Boolean pattern of reaction: 

z,j = B{z,j{q)} = 

with 

Zij = l^^ if 

1 0 > 1 Zij{q)=0, 

where ones indicate the sensitivity of particular elements of the residual vector 
r{k) to a given error fj{k) (Gertler, 1991; 1995; Gertler and Luo, 1989; 
Gertler and Monajemy, 1995; Gertler and Singer, 1990; Gertler et al, 1993; 
1995; Kowalczuk and Gertler, 1992). 

The structure of a particular i-th residue can be expressed in a similar 
way by applying the latter transformation with respect to Zi^{q) from (5.17). 
A respective Boolean row pattern 

^i,=B{zi,{q)]=\^^ii ... Zik, 




(5.21) 
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can, in general, be designed for v > m, which means that each row model 
Zi.{q) has to relate to a different subset of errors, but the number of ze- 
ros cannot exceed (m — 1) (the remaining non-zero responses are then not 
defined). 

Hence the structures of quantitative descriptions of the generator reaction 
in terms of a single residue concerning, for instance, 

• almost full homogeneous row specifications, 

• full almost homogeneous row models (with an imposed single reaction) , 
can be expressed quantitatively by means of the following Boolean row pat- 



terns: 



and 



0---0 1 0---0 



m—l 



Structured residues responding to different subsets of errors can be de- 
scribed by different Boolean row patterns (Gertler, 1995), which means that 
also Boolean column patterns of the resulting residual reaction (5.18) to a 
single error fj{k) are all different. Equivalently, these residual vectors belong 
to different subspaces of the parity /residual space. 

Directional characteristics of the generator reaction to a single error fj{k) 
can be gained by specifying the residual vector (5.18) in the form of (Gertler 
and Monajemy, 1995): 

r{k\fj) (5.22) 

where (pj is a fixed (direction) vector in the parity space, and Gj (q) denotes 
a scalar transfer function of this error. Identical dynamics of all the residual 
reactions to the analysed error fj{k) allow the preservation of the fixed 
direction pj in transient states. 

We usually demand that the three basic design parameters be equal: 
V = w = m. This makes up for the equalisation of the residue r and er- 
ror / spaces (in the form of a common space) and for using the full non- 
homogeneous row model Zi.{q) (confer (5.17) through (5.20)) along with 
Si{q) = S{q) and 



where 



and 



Z{q)=[z.i{q) •• 

^ Pi 

S(^) = diag [ o-i(^) ... Gy{q)Y 



Pv j 5 



A diagonal (orthogonal) reaction results from assigning the residual di- 
rections to an ortho-normal basis ^ = ly of the common space: 

Z{q) = E(g). 
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A uniform reaction is characterised by an identical dynamic response in 
all orthogonal directions: 

0-1(9) = ••• = o-„(g) = a{q), 

Z{q) = S(g) = a(q)h. 

This is an ideal model, which allows multiple fault isolation (Gertler, 1995) 
and which should be taken into account while designing structured residual 
responses. 

Note that the full non-homogeneous row design approach is naturally 
shaped for the directional characteristics (5.22) of the residue generator, while 
the full almost homogeneous row specifications and the almost full ones are 
suitable for the structured (and diagonal) residual- vector generators. 

Various practical illustrations of the design for exemplary dynamic sys- 
tems, as well as pertinent implement al details of the discussed approach (con- 
cerning the existence, uniqueness, causality, stability, and numerical complex- 
ity issues of the residue generator), can be found in the works of Gertler (1995; 
1998). 

With reference to the previously mentioned differentiation between faults 
and disturbances, it is clear that the reaction to the indicated disturbances 
should, in general, be specified as zero. This naturally restricts to (m — 1) 
the number of disturbances to which the residues can be decoupled, and 
diminishes the number of possible fault specifications. 

Algorithmic properties. Taking into consideration (5.13) or (5.15) and 
utilising (5.11), a single residue can be computed from 

ri{k) = Wi.{q)y{k) + Vi.{q)u{k), 
where, according to (5.13), 



= -Wi.{q)M{q). 



The row transfer functions Vi.{q) and Wi^{q) are, in general, real ratio- 
nal functions of the shift operator q. Nevertheless, from a practical point of 
view, it is advantageous to design the residue generator so as to shape the 
parameter transfer functions Vi^ (q) and wi. {q) into polynomials of the vari- 
able q~^. This not only simplifies the structure of the generator algorithm 
but also makes a statistical analysis of the residues simple (Gertler and Mon- 
ajemy, 1995). Thus such an assumption affects the design specification and 
completion, as well as the properties of the resulting system implementation. 

5.2.2. Properties of the system matrix 

Based on the state-space description (5.8) of the monitored object, a system 
matrix (Fairman, 1998; MacFarlane and Karcanias, 1976; Rosenbrock, 1970) 
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of the error/fault channel (sub-system) defined for the error input f{k) is 
represented by the following construction: 



r+(g) = 



qlji — A —E 
C F 



n 

m 



n V 



(5.23) 



To elucidate certain implemental constraints of residue generating sys- 
tems, the subsequent analysis is limited to the case of (full specification) 
V = m of the square matrices F+(^) and S{q). 

The rank of the error system matrix T~^{q) amounts (Gertler and Mon- 
ajemy, 1995) to 

rank(r"^(g')) = n A rank (5(g)) 

for all q ^ Xj, where Aj, j 1, . . . are poles of the system, which can be 
seen as zeros of the polynomial h+(g) form (5.3)-(5.10). 

The inverse of the matrix T+(g) can be established as 



T+{q)-^ 






1 

7+ (9) 



Vr^{q) Q.'p^q) |^;=m 

n m 



(5.24) 



where 

n+{q) = adj(r+(g)), 

7 +(g) = det (r+( 9 )), 

and 7“*"(g) is an invariant zero polynomial of the error system/ channel. It 
is a useful relationship since based on the applied partitioning of the ma- 
trices (5.23)-(5.24) and by performing appropriate multiplications (Gertler, 
1998; Kailath, 1980), we obtain 

= [C{qln-A)-^E + F]-' =n+{q), 

= -n+{q)C{qIn - A)-^ = nUq), 

^ (5.25) 

^ = iqI„-Ar^EnUq), 

= (qI„-A)-^{ln + EClUq)). 



Invariant zeros (^ 1 , C 2 5 • • • of the error system are roots of the character- 
istic equation 7 “^(g) == 0. By defining the system matrix for the j-th. error 
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sub-channel induced by a single error fj{k): 



r/(?) = 



qin - A 

c 



n 






n 

m 



(5.26) 



we presume that the value q = (i determines an invariant zero of the sub- 
system (5.26) if it reduces all (n-|-l) x (n-f-l)-sub-determinants of the matrix 
rt(g') to zero. Such zeros may not exists. However, if they do occur, they 
also appear in any composed-error sub-system containing as well as 

in the full system T~^{q) of the transmission of all the errors. 

The degree of the polynomials related to the inverse of the error system 
matrix T~^{q) can be determined on the basis of the analysis of the matri- 
ces (5.23)-(5.26). 

The polynomial 'y~^{q) of invariant zeros results from the (n-f m) x (n-hm) 
determinant of (5.23), while the elements of the adjoint matrix ^~^{q) arise 
from the (n-hm — 1) x (n-hm — 1) determinants. Their polynomial degree is 
determined be the largest diagonal minor of {qIn — A). It turns out (Gertler, 
1998) that polynomial degrees depend both on the system order n and the 
number k of strictly input errors, which, having a strictly proper transfer 
function, are also characterised by a zero column in the input- 

output transfer matrix F. 

Obviously, the maximum degree of J'^{q) is n. Consider a Laplace expan- 
sion of the determinant det(T“^(^)) carried out with respect to the j-th error 
column [ — eT /^- ]^. It is clear that the cofactors (signed minor) associ- 
ated with all the elements of the column e.j loose a row of (qin — A). Thus, 
in the case of strictly input errors with /,j = Om, the polynomial degree is 
diminished by one and, consequently, 

deg (7“^(g^)) <n - K. (5.27) 

With argumentation similar as before applied with respect to the determi- 
nants of the marked sub-matrices O.'liq) of Q~^{q), it can be shown (Gertler 
1998) that 



deg {u}^j{q)) 




n — K, 
n — K-\- 1 


if 


f*j 7^ ^rm 


(5.28a) 


deg {n+{q)) 




n — K-hl 

j 

n 


if 


k: > 0, 

AC = 0, 


(5.28b) 


deg (wjj(g)) 


< 1 


\ n — K — 1 

[ n — hi 


if 


/•j 7^ 

/•J “ OfYl') 


(5.28c) 
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deg (nj(g)) < 1 


n — K 
n — 1 ’ 


if 


/^ > 0, 
k: — 0, 


(5.28d) 


deg ( 0 ^( 9 )) < 1 


r 

n — n 
n — 1 ’ 


if 


AC > 0, 

/c — 0, 


(5.28e) 


deg(n;^(g)) < 1 


n — n — 1 
0 


if 


K < n, 
hi = n, 


(5.28f) 



where the additionally given rows (q) of 17+ {q) are associated with par- 
ticular error sub-channels. 

The realisable form of the inversed error system matrix (5.24) can be 
derived taking into account the effect of (5.27): 

( 5 . 29 ) 

where 

= g-«+«ll+(g), (5.30) 

7 ( 9 ”^) =7o+7ig”^ (5.31) 

The adjoint marix 17+ (^) is fully degenerated, as its rank at particular 
invariant zeros is 



rank (l7+(Cj)) = 1 for i = 1, ..., n — k, (5.32) 

while 7+(Ci) = 0 if i = 1, . . . ,n — k. This directly results from the fact 
(Gantmakher, 1959) that any (2 x 2) determinant composed of four elements 
of 17+ (g) can be expressed as 



det 






— CJ+ (jj'tj — CJ+jCJ^" 
^ac'-^bd 



^ad^bc 



^^td.abiQh'^iQ)’ 



where cj+ (^), oj'^^iq), ^adi^) elements taken from any rows 

(a,b) and columns (c,d) of 17+ (^), while ^^{q) denotes an (n-f-m — 2)x 
(n -h m — 2) sub-determinant of the matrix T+ (q) gained by eliminating its 
(c,d) rows and (a, 5) columns. Since the invariant zero polynomial 7+(g^) 
can be factored out from any (2 x 2) determinant of 17+ (^), all the columns 
(and rows) of 17+ (g^) are linearly dependent at q = Ci^ i = 1, . . . ,n — k. 
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5.2.3. Non-homogeneous residual reaction models 

Taking into account non-homogeneous reference models (Gertler, 1995; 1998), 
we shall now consider full non-homogeneous {FnH), full almost homogeneous 
{FaH) and almost full non-homogeneous {aFnH) design specifications. 

Full non-Homogeneous specification {FnH). The design equation of a 
single residue is given by 



m*{q)S{q) = Zi,{q), 



where, for simplicity, we assume that S{q) is an (m x m) transfer function 
matrix. Then the solution in terms of design parameter vectors and matrix 
is 

Wi»(q) = Zi,{q)S(q)-'^, 

W{q) = Z{q)S{q)-K 

Taking into account (5.25) and (5.9) we have 
Cl'p{q) 



7+ ( 9 ) 



= ^f(9) = S{q) \ 



»^(g) _ 



ClUq) = -S{qr^C{qI„ - A)-\ 



if''iq) 

and by virtue of (5.13) and (5.9): 

V(q) = -W{q){C{qIn - A)-^B^ + Du). 
Hence from (5.33) and (5.34) we have 



7+(g) 



and/or 



yji»{q) = 



Zi,{q)Q.F{q ^) 



7(g"M 

Now, based on (5.35) and (5.33), we have 

V{q) = -Z{q)S{q)-^C{qIn ~ A)~^Bu - W{q)Du, 
and further from (5.34) and (5.36): 

Z{q){nUq)Bu - n+{q)Du) 



V{q) = 



7+(g) 



(5.33) 



(5.34) 



(5.35) 



(5.36a) 



(5.36b) 



(5.37a) 
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( \ ^i*{Q)i^c{Q ^)Bu ^f{q ^)Du) /k q7U\ 

Vi,{q) = • (5-37b) 

The existence and uniqueness of solutions for arbitrary non-homogeneous 
design specification is ensured by the non-singularity^ of transfer function 
matrices: 

rank(5(g)) = m. (5.38) 

A non-unique solution can be found in the case of rank defect (Gertler, 1995); 
for instance, zero response specification is accessible for several errors even if 
their respective columns of 5(g) are linearly dependent^. Therefore, in such 
cases, some additional specifications/restrictions have to be imposed on the 
design. 

The causality {realis ability) of the residue generator (5.36)-(5.37) cannot 
be obtained for the strictly input errors {f.j — Om) when, by virtue of (5.28a) 
and (5.30), we have 



^Fjiq ^) = Wpg + Wp + Wpg ^ + --- . 

Therefore any specification Zij{q) concerning a strictly input error must be 
either zero or delayed by q~^ (the system response cannot be immediate). 
This is a simple consequence of the delayed object response as well as the 
exposition of the general causality principle known from the control system 
design literature, advising that the compensation of plant dynamics is not 
always possible. 

The stability of the residue generator (5.11) designed according to (5.36) 
and (5.37) depends on the polynomial 7 ^(g) and its zeros (Ci, C 2 ? • • • )? being 
the invariant zeros of the system (5.23) under the infiuence of the errors fj{k) 
(while the stability of the object itself (5.6) is irrelevant). This means that 
any unstable with \(i\ > 1 , must be cancelled by an additional factor 
(Gertler, 1998) included in the design specification Z{q) = [^ij{Q)]- For a 
polynomial realisation (with a Finite Impulse Response, the so-called FIR 
systems) , it is necessary to cancel the whole polynomial 7 + (g) , which assures 
the generator’s stability at the same time. 

Note that the above cancellations take place solely within the synthesis 
phase of the residue generator design procedure (5.11), and have nothing to 
do with hardware cancellations avoided in controller implementation. Thus, 
to make sure that numerical inaccuracies will not lead to internal-generator 
instability, the cancellations should be performed analytically. 

^ In the sense of zero in the field of real rational polynomial functions or in the ring 
(with identity) of real polynomials. 

^ The referencing notion of linear independence is suitably defined within the field 
of real rational polynomial functions or the ring of real polynomials (see also the 
following material). 
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The complexity of the computational algorithm results, in principle, from 
the formulae (5.27)-(5.31). Clearly, both the design specifications of the gen- 
erator reaction to the errors and the method of algorithm implementation 
infiuence the complexity of the system. Nevertheless, on the assumption that 
there are no zero-pole cancellations and that the postulated specifications 
allow generator realisability we have 

• the denominator degree defined by (5.27), 

• the numerator degree resulting from (5.28a,b). 

With Full non- Homogeneous specifications (FnH) of the generator reac- 
tion the design can be performed using the simplest way of cancelling the 
unstable or all non-zero invariant zeros of the error system (as potential 
poles of the generator) by including them in all specifying transfer functions. 

Stabilisation can be obtained by decomposing the invariant zero 
polynomial 

into a stable and unstable ' 7 ( 9 “^) part and modifying the original 

specifications z*^{q) as follows: 



^i.{q) = 'i{q~^KM)- 



(5.39) 



Thus instead of (5.36) and (5.37) we have 



Wi* {q) 



z*i,{q)^F{q ^) 

7'(g-i) 



.. _ zUqWc{q-^)Bu-^F{q-^)Du) 

- y(^-i) 



(5.40) 



In such a way not only a stable algorithm is synthesized without superfiuous 
cancellations but also the order of the residue generator is diminished. This 
is, however, obtained at the cost of the an entangled error-system reaction 
model (5.39), which can result, for instance, in delaying the moment of error 
detection. 

A polynomial error reaction is accessible when instead of (5.39) we apply 

zi»{q) =i{q~^)z*i,{q)- 



This specification also makes the transfer functions of particular errors even 
more complicated, but the residue-generator algorithm (5.40) itself is clearly 
simplified: 

Wi»{q) = zl,{q)9.F{q~'^), 



Vi»{q) = zl,{q){Q.c{q ^)B„ - ^/’(g ^)T>„). 
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The invariant zeros of a single j-th error sub-channel are roots of all 
but the j-th row of d,nd This means that they ought to 

be included only in the specification of the response to the j-th error (in 
all residual reaction row-models). Similarly, ‘confirmed’ zeros of sub-systems, 
transferring a group of errors, need to be included only in specifications cor- 
responding to these sub-systems (Gertler, 1995). 

The polynomial simplicity of the residue generator algorithm can also 
be gained by ‘tuning’ each single element of the row specifications so as to 
obtain the desired cancellation (Gertler, 1995; 1998). Note that in the case of 
directional characteristics of the specifications (5.22), fixing the dynamics of a 
specified element determines also the direction of the transient response of the 
residue generator to its corresponding error in the parity space. Apparently, 
the presence of sub-system zeros often entails using special design procedures 
(Gertler, 1995; 1998). 



Full almost Homogeneous specification {FaH). Considering the follow- 
ing simple row model of specifications: 



Zi»{q) = 

the generator algorithm 
Wi»{q) = 

Vi.{q) = 



0 ••• 0 Zijiq) 0 ••• o], (5.41) 

(5.36)-(5.37) can be shown in a facilitated form: 

Zij{q)‘^Fj{q~^) 



7{q~^) 

Zij{q){^>Jcj{q~^)Bu - LJFj{q~^)Du) 



(5.42) 



7(9“^) 



It is clear that Zij{q) must contain all undesirable/unstable zeros sub- 
mitted for cancellation - with the effect similar to (5.40). If the error sub- 
system without the j-th error has zeros, they appear in both row vectors 
u)Fj{(l~^) and Thus they need not be embodied into the specifi- 

cation (Gertler, 1995; 1998). 



Almost-Full non-Homogeneous specification {aFnH), If the specifi- 
cation is not full then the resulting freedom can be accordingly used in the 
ordinary course of generator design - having in mind the basic idea of con- 
verting the non-square system into a square one (Gertler, 1998). With an 
almost full row model of a residual reaction Zi.{q)^ there are m — 1 error 
responses, of which at least one is not zero. In that case the simplest method 
of formal design description is to introduce an extra error input (which may 
as well be a control/measured input) resulting in 



'^i*{q) Sj^{q) S^rniQ) — ^imio) j 5 
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where S*{q) is the original m x (m — 1) matrix transferring the errors to 
the plant output (5.6), and z*^{q) describes the original specification, while 
^•m{q) denotes a column independent of the ones from S*{q), and Zim{q) 
portrays an extra specification. The extra error may represent a fictitious 
error or a factual (control or measured) input. Then the solution can be 
sought with the use of the above method {FnH). 



5.2.4. Homogeneous residual reaction models 

Prom a functional point of view, homogeneous specifications used in residue 
generator design represent the feature of de-coupling the intended system 
from the impact of plant disturbances. Such error signals are undesirable 
and unmeasurable. Thus their infiuence should be eliminated (or, at least, 
minimised, as shown in the following sections and the next two chapters). On 
the other hand, from a procedural viewpoint, homogeneous itemisations are 
generally less demanding than non-homogeneous ones, and, basically, after a 
suitable conversion, can be accordingly treated within the non-homogeneous 
design approach. 

Full Homogeneous specification {FH), Because the full homogeneous 
statement {FH) of the problem symbolised by a homogeneous equation 

Wi,{q)S{q) = Zi,{q) = 0^ (5.43) 

formally leads to a useless trivial solution Wi.{q) — we shall now concen- 
trate only on almost full (aFH) homogeneous specifications of the residual 
reaction model. 

At this point it is worth noticing that two row functions Wi.{q) and 
^i*(^)j being in the relationship Wi.{q) == a{q)wi.{q), where a{q) is a (ra- 
tional) polynomial transfer function, are linearly dependent in the (rational) 
polynomial sense: 

Wi,{q) = a{q)wi,{q) = 0^, 

and can be considered as similar transformations/solutions (Gertler, 1998) 
of the same equation: 

m»{q)S{q) = a{q)wi,{q)S{q) = 0^. 



Almost Full Homogeneous specification (aFH). With an original al- 
most full homogeneous reaction model, S{q) is an m x {m — 1) matrix, 
while 



Zi.{q) = 



Zil 






= 0 . 



m—1 • 



Clearly, such a system has no unique solution. 



Solving (aFH) by augmentation to (FaH). With a simple augmentation 
of design assumptions by extra non-zero specification, the discussed setting 
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of the type (aFH) is converted into full almost homogeneous specification 
(FaH), the solution of which has been given in (5.42). 

As evidently seen in the design description (5.42), all potential solu- 
tions are similar, and a multiplicity of solutions arise from the scaling factor 
^ij{Q ) being a consequence of augmentation, and from the fact that 
u)Fj{q~^) and ucj{q~^) are polynomials completely determined by the orig- 
inal dynamic system (homogeneously specified). 



Solving (aFH) by decomposition to (aFnH). Designing the residue 
generator for this type {aFH) of non-homogeneous specifications can be easily 
converted to the non-homogeneous variant {aFnH). Such a method (Gertler, 
1995; 1998), developed around a selected element of the designed row Wi.{q)^ 
consists in transforming the m x (m — 1) homogeneous problem into an 
(m — 1) X (m — 1) non-homogeneous one. Let us present (5.43) in the following 
decomposed form: 





Si{q) 

^3*{q) 






m—l 5 



(5.44) 



where w\^{q) denotes the row vector Wi^{q) without its j-th coordinate and 
S^{q) represents the matrix S{q) without its j-th. row, the equation (5.44) 
can be re-written as: 



Wi,{q)S^{q) = -Wij{q)sj,{q). 

In terms of the external structure, the above resembles the full non- 
homogeneous setting {FnH), having the solution (5.33). Thus, by assigning 
a certain polynomial or rational function to the selected element Wij{q) of 
the design, we find the following analogy of (5.33): 

= -^ijiQ)sj.{q)S^{q)~'^, (5.45) 

with similar computational consequences. Moreover, from the substitution 
of (5.45) into (5.44) it is clear that all possible solutions, depending on various 
forms of Wij{q), are similar. 

By following the prescriptions of (5.9) and (5.23)-(5.25), the settlement 
(5.45) can be shown in the form (Gertler, 1995; 1998): 



<(?) = 






(5.46) 



where 0^(g ^), 0^(^ ^) and j^{q ^) have been computed from the matrix 
5^(g). 

The stability of the generator can be obtained by cancelling all unstable 
poles of 7-^(g“^) using the design polynomial Wij{q). Furthermore, the gener- 
ator’s polynomial response will result from including the whole denominator 
J^{q~^) in this design polynomial. 
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The analysis of the system of the output-signal filtering Vi.{q) leads to 
similar conclusions. By virtue of (5.13) we have 



= - 



wlM) Wij{q) 



M^q) 

mj,{q) 



(5.47) 



which on the basis of (5.45), (5.9) and (5.25) can be expressed as (Gertler, 
1995): 



Vi,(q) 



-Wij{q)dj,{q ^) 






(5.48) 



where 



dj,(q ^) = [ Cj, 







4(9 

4(9“^) 





1 

1 




-Di \ 



The causality of the residue generator specified as {aFH) is assured 
because in the formulae (5.46) and (5.48) non-causal rows of the matrix 
are eliminated by zero- valued elements of the row /j*. Furthermore 
(confer also the estimates (5.27)-(5.28) Gertler, 1998), the degree of poly- 
nomial transformation, determining the complexity of the residue-generator 
algorithm, is limited as follows: 



deg {wi,{q)) = deg {vi,{q)) <n- k. (5.49) 



5.3. Parity space approach 

The approach proposed by Chow and Willsky (1984) can presently be consid- 
ered a standard method of generating dynamic parity equations. It is founded 
on the description of the supervised object in a state-space and on suitable 
static transformations of this model. Because (similarly to the case of the 
transfer-function approach discussed previously) the sets of parity or consis- 
tency relations lead to multi-dimensional residues, which can be interpreted 
as elements of some (linear) vector space, this approach is generally referred 
to as the method of the parity space. 

The model of a discrete-time system taken into account this time has the 
form: 



x{k + 1) = Ax{k) + Buu{k) + Bdd{k) -f Ef{k), (5.50) 

y{k) = Cx{k) -h Duu{k) -h Ddd{k) -h Ff{k), (5.51) 

where x{k) G denotes a state vector, u{k) G stands for a control 
signal, y{k) G is a measurable output signal, d{k) G describes an 




174 



Z. Kowalczuk and P. Suchomski 



unmeasurable disturbance or perturbation signal (including both disturbance 
and measurement noise signals) affecting the object, while f{k) G models 
the fault (defects or permanent failures). The matrices of (5.50) and (5.51) 
have suitable dimensions, as follows: A G C G E^^^, Bu G 

Du G Bd G E^x^S Dd G E G E^><^ and F G By 

piling up the above formulae for consecutive discrete-time moments, from the 
furthest moment {k — s) until the running time k, where s > 0 is a design 
parameter (limited memory length) , we obtain the following aggregate form 
of the description of the system’s output behaviour in the time window of 
the length (s-hl): 

ys{k) = Csx{k - s) + DusUs(k) -h Ddsds{k) -h Dfsfs{k), (5.52) 

where ys{k) G Us{k) G , ds{k) G and 

fs{k) G are vectors whose coordinates are particular signals, suitably 

arranged: 





y(k - s) 




u{k — s) 


Vsik) = 


y{k-s + l) 


, Us{k) = 


u{k — s + 1) 




y{k) 




u{k) 




d{k - s) 




f{k - s) 


ds{k) = 


d{k — s + 1) 


, fs{k) = 


f{k- s + 1) 




d{k) 




. f(k) . 



while Cs G Dus e R(s+l)mx{s-\-l)n^ ^ ^ £(s+l)mx (^+l)n^ 

Dfs G E(«+i)mx(s+i)t; matrices of the following structures: 





c 




Du 


Dmxriu 


Omxriu 


Cs = 


CA 


5 d^US — 


CBu 


Du 


ddmxriu 




1 

o 

Co 




. CA^-^Bu 


CA^-‘^Bu ■ ■ ■ 


Du 






r 


Drnxud 


Dmxud 


1 



Dds = 



CBd Dd 

CA^-^Bd CA^-‘^Bd 



O 



mxrid 



D, 
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F 


^mxv 


Omxv 


CE 


F 


^mxv 


CA^-'^E 


CA^-^E ■ ■ 


F 



Dfs — 



The residual vector, determined on the basis of the available data, is 
defined as 

r{k) = W{ys{k) - DusUsik)), (5.53) 

with a weighting matrix W € (*+!)"* being a design parameter (Chen 

and Patton, 1999; Chow and Willsky, 1984; Gertler, 1998; 2000; Gertler and 
Singer, 1990; Lou et ah, 1986). Taking into consideration (5.52) we obtain 
the following form of this vector: 

r{k) - WCsx{k -s) + WDdsdsik) + WDfsfs{k), 

in which, apart from a principal component WDfsfs{k) of great diagnostic 
significance that carries information about the fault, we have a weighted state 
vector WCsx{k — s) and a component WDdsds{k) embracing the effects of 
the disturbances. The basic idea of the method of parity relations (Basseville, 
1988; Chow and Willsky, 1984; Gertler, 1998; Mironovski, 1980) consists in 
such a fabrication of the weighting matrix W that the residual vector r{k) 
described by the parity equation (5.53) is made independent of the state of 
the system under supervision: 

WCsx{k - s) — Ow, Vx(fc - s), 

guaranteeing, at the same time, the possibility of detecting the fault. This 
means that the matrix W has to meet the following restrictions: 



kPCg — Owxni 
WDfs 7^ 0^^x(s+l)v 



(5.54) 

(5.55) 



The rows of the matrix W , being a solution of the equation (5.54), have to 
belong to the left kernel {null subspace) of the matrix Cg. The necessary and 
sufficient condition for the existence of such a non-zero matrix W is thus the 
fulfilment of the inequality dim (Ker (Cj)) > 0. By taking into account the 
above form of the matrix we can easily observe that this condition can 
always be satisfied by taking a large enough value of the parameter s, referred 
to as the order of the parity equation (5.53). For practical reasons, originating 
in the postulate of diminishing computational costs of the implementation of 
the designed diagnostic generator, we should search for solutions of a possibly 
low order s. Inequality constraints for a minimal permissible value Sq of this 
order were given by Mironovski (1979; 1980): 



rank (Mo) 
rank(C) 



< 5o < rank(Mo) — rank(C7) + 1, 
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where Mq G denotes (see Appendix in the next chapter) an observ- 

ability matrix of the pair {A, C). If the pair (A, C) is completely observable 
and the matrix C has a full row rank (rank(C) — m < n)^ the above in- 
equalities take the following form: n/m < sq <n — m-fl. 

It should be emphasised that an undue reduction of the parity-equation 
order is not recommended at all. This is so, with regard to a disadvantageous 
loss of robustness of the respective numerical FDI procedures to perturbations 
in the processed data (Lou et al, 1986). 

As has been mentioned, the existence of the solution to the equation (5.54) 
can always be assured. It can, however, be easily shown that such a solution 
can be non-unique. Attainable degrees of freedom should be utilised: (1) for a 
desirable ‘accentuation’ of the information on a probable fault in the residual 
vector (which corresponds to the postulate expressed in (5.55)); and (2) for 
possibly the best decoupling of this vector from disturbances (which, in turn, 
is consistent with the presumption of making the matrix WDds zero). It shall 
be shown in the next chapter of this book that, by starting with other than 
the currently examined premises (namely, when residue generator synthesis 
is based on a suitable formulation of the eigenstructure of a state observer 
for the diagnosed system), we can derive parity equations which are both 
decoupled from plant disturbances and characterised by improved numerical 
robustness. 

A general diagram representation of the residue generation system con- 
sidered here is shown in Fig. 5.1 (Chen and Patton, 1999). 



d(k) f(k) 




Fig. 5.1. General scheme of the residual vector generator 

As has been shown in the previous section, effective consistency relations 
interpreted in terms of the parity space can also be constructed on the basis 
of input-output (operator) models of dynamic objects. With this, it turns 
out (Gertler, 1991; 1995; 1998; Viswanadham et al, 1987) that the state- 
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space approach (Chow and Willsky, 1984) to residue generation synthesis 
is equivalent to the transfer-function method when applied with almost full 
non-homogeneous {aFnH) and homogeneous (aFH) specifications. 



5.4. Deterministic assignment of state estimation 

An observer is a system which estimates the internal state of the process 
(object, plant) on the basis of external signals being observed. The concept 
of observation systems was introduced by Luenberger (1966) several years 
after the declaration of the optimal estimator (filter) by Kalman (Athans, 
1996). Thus it is worth emphasising that the Kalman filter described in the 
next section is, in fact, an exemplification of a Luenberger observer optimised 
with respect to stochastic excitations. Diagnostic Luenberger observers shall 
be discussed in Subsection 5.4.4. 

An observer of a dynamic process P{x,y,u) - described by means of a 
mathematical model including a vector state variable x, an output and 
an input u - also represents a dynamic system P{x,y,u) whose state x 
converges to the state x of the process P, irrespective of the applied input 
u and the actual state x (Priedland, 1996). 

Assume a discrete-time process model P{x,y,u) in the standard state- 
space form: 



x{k + 1) = Ax{k) -h Buu{k)^ (5.56) 

y(k) = Cx{k), (5.57) 

where x{k) G represents a state vector, u{k) G E^^ is a known control 
signal, while y{k) G E"^ denotes a measurable output signal. The matrices 
in the above formulae have suitable dimensions A G E^^^, Bu G E^^^“ 
and C G on the assumption that the matrix C has a full row rank 

(rank(C) = m < n). Moreover, it is presumed that the pair (A,(7) is com- 
pletely observable. 

5.4.1. Full-order observer 

Let x{k) G E^ denote a certain estimate of the state vector which, ac- 
cording to (5.57), corresponds with the following plant output estimate: 
y{k) = Cx{k) G E”^. Since we have assumed that the output signal is mea- 
surable, the error (or residual signal) of this estimate can be determined as 

ye{k) = y{k) - y{k) G E"" . (5.58) 

By introducing a parallel notion of the state estimation error: 



Xe{k) = x{k) — x{k) G E^, 



(5.59) 
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we can easily observe that it is the signal ye{k) that carries the information 
about the latter error: 

ye{k) = Cxe{k). (5.60) 

A natural idea may thus occur that the residual signal can be utilised to 
improve (in a certain sense) the state estimates generated by the observer. 
With this, various criteria can be employed in the characterisation of the 
quality of the estimates. In the next part of this subsection we shall consider 
a fundamental guideline consisting primarily in assuring the asymptotic con- 
vergence of the state to zero from any initial state estimation error Xe{0). 
This prerequisite, along with certain requirements concerning the rate of the 
decay of the state estimation error, lays the foundation for the standard de- 
sign of asymptotic deterministic state observers. In such a task we assume 
that the only source of the uncertainty of the state estimate is a lack of data 
on the initial conditions x(0), having assumed full (structural, parametric 
and signal) knowledge of the model (5.56)-(5.57) of the monitored object. In 
that case this type of knowledge stands for, for instance, information about 
the control signal and the absence of plant disturbances. 

The evolution of the vector x{k) can be described by the following state 
equation: 

x{k + 1) = Ax{k) + Buu{k) + Koyeik), (5.61) 

in which - apart from obvious elements resulting from the model (5.57) - 
there is an additional excitation term Koye{k), scaled by a matrix observer 
gain Ko G . This term originates from the known residual signal ydk) 
and makes a contribution to the improvement of the state estimate. The 
scaling Kq of the running residual signal is chosen in such a way that both 
the assumed rate of convergence lim^^oo ^e{k) = On and suitable (aperiodic) 
forms of transient processes are assured. With the postulated linear object 
model, the above affine form of the observer is sufficient for the established 
task of asymptotic observation. This formula turns out to be adequate also 
for certain more complex tasks of observer synthesis, including those in which 
some non-stationarity and uncertainty of the plant model are admitted. 

By taking into account the formulae (5.60) and (5.61), we obtain the 
following fundamental equation of the state observer: 



x{k + 1) = Aox{k) -f Bo 



u{k) 

y{k) 



(5.62) 



where Aq = A— KqC G is a state transition matrix of the observer, and 

Bo = [ B^y ] e e , B^y = Ko e 

The achieved state observer (5.62) has thus the form of a difference equa- 
tion with its right-hand side being affine with respect to both the running 
estimate x{k), i.e., the observer output, and the external signals u{k) and 
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y{k)^ i.e., the observer input. As can be easily seen, for the given initial con- 
ditions o;e(0) the evolution of the state estimation error is described by the 
following: 

Xe{k + 1) = AoXeik). (5.63) 

The design prerequisites concerning the error behaviour can thus be ex- 
pressed in terms of appropriate postulates to be fulfilled by the matrix Aq. 
Certainly, a principal condition is the stability of Aq. The latter means that 
the spectrum of the matrix, i.e., a set of its eigenvalues^ A(Ao), should be 
included in the open unit circle v{z) = {z ^ p : \z\ < 1} of the plane of a 
complex variable (Brogan, 1991; Jury, 1964). Our expectations concerning 
the rate of the transient estimation processes also can be expressed by spec- 
ifying acceptable/admitted subsets of the interior of the circle v{z) (Ogata, 
1995). If the pair (A,C) is detectable, then it is possible to find such a val- 
ue for the design matrix Kq that assures asymptotically the zeroing of the 
estimation error (5.59), regardless of the initial value a;e(0). In the case of 
a completely observable pair (A, C), the state transition matrix Aq of the 
observer can attain an arbitrary composition of its spectrum. 

Consequently, the analysed deterministic state-estimation problem in the 
closed loop reduces to the pole placement task of placing suitably the eigen- 
values of the observer state matrix Aq to right positions on the complex 
plane. 

The order of the observer is equal to the order of the plant model n. 
Note that, in general, n is different from the number m of independent 
observations, modelled in (5.51) and (5.57) as well as (5.1) and (5.8). Such an 
observer, having the structure as in Fig. 5.2, is called a full-order observer. In 




Fig. 5.2. Structure of the full-order state observer 



^ For which the matrix {Aq — XI) is singular (has no inverse). 
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the next subsection we shall consider a simplified structure of a state observer, 
characterised by a minimal order, referred to as a reduced- or minimal-order 
observer. 



5.4,2. Minimal-order observer 



Let us assume that m < n. Moreover, let a matrix C G of a full 

row rank, rank ((7) = n — m, be an arbitrary orthogonal row complement of 
the matrix (7, meaning that Im([^]) = and Im((7^) = Ker((7). 

The matrix C can be easily obtained from the svd"^ decomposition of 
the matrix C: C — UT,[ V V ]^, where the factors U G and 

[ V F G E’^^^ are orthonormal matrices of the left and right singu- 
lar vectors^ of (7, with F G and F G , while E G E^>^^ is a 

matrix of a block structure, in which we have a diagonal submatrix contain- 
ing singular^ values of the matrix C. As the columns of the submatrix V 
constitute an orthonormal basis for the null subspace Ker((7), we find that 
C = V'^. 

Moreover, it can be easily verified that 



C 

C 



- [ c+' c+' ] , 



where the right pseudo-inverse matrices (Boullion and Odell, 1971; Rao and 
Mitra, 1971) have the following form: C~^' = C^((7(7^)“^ G and 

(7+' (7^(C(7^)-1 G . 

The above defined matrices determine a useful direct-sum decomposition 
of the state space: E^ = Im((7^) 0lm((7^) , which, at any time moment, VA:, 
allows the following unique representation of the state vector x{k): 

x{k) - xc{k) -h Xc{k) 

having the components xc{k) G Im((7^) and XQ{k) G Im((7^). The re- 
spective matrices of orthogonal projections, such that xc{k) = 
and XQ{k) — P\i^i^c'^)x{k) ^ are constructed as Pim(c^) — {CC^)~^C = 

C^'C G and Pi^i^cT) = C'^{CC'^)-^C = C G E^^^. 

By virtue of the above, by taking into account (5.57), we adopt 

x{k) = C~^ y(k) -h XQ{k). (5.64) 

Since CxQ{k) — Om, the component XQ{k) evidently cannot appear in 
the output signal y{k), V/c. It is thus necessary to build a suitable observer 

^ Singular value decomposition; see, for instance, Forsythe et al. (1977) or Brogan 
(1991). 

^ Being (right) eigenvectors of CC^ and C^C, respectively. 

® The positive square roots of the eigenvalues of C^C. 




5. Control theory methods in designing diagnostic systems 



181 



of this component. To that end, let us introduce an auxiliary output signal 
y{k) e as 

y{k) = Cx{k). (5.65) 



On that account, XQ{k) = C^'y{k), which entitles us to interpret XQ{k) as 
a certain ‘observation’ of the signal y{k). An aggregate of (5.57) and (5.65) 
results in 



y{k) 




’ c ' 


y{k) 




c 



x{k). 



(5.66) 



Consequently, the equation (5.64) can immediately be shown as 



x{k) = c+' C+' ] 



y{k) 

y{k) 



(5.67) 



which, in turn, allows us to portray the vector state x{k) as an affine^ trans- 
formation (observation) of the auxiliary output y{k). 

Taking the above into consideration, we can approach the task of the syn- 
thesis of an observer of the auxiliary signal y{k). According to the preceding 
findings, such an observer should have the form of a difference equation with 
its right-hand side affine with respect to the available signals u{k) and y{k). 
From the equations (5.56) and (5.67) it results that 

CC^'y{k + 1 ) + CC^'yik + 1 ) = CAx{k) + CBuu{k), 
which - considering CC^ = 0(n-m)xm CC^ — In-m ~ makes 
y{k + 1) = CAx{k) + CBuu{k), 



Therefore, the equation of the sought natural observer of the auxiliary 
output signal y{k) acquires the following state-equation form (consistent 
with the former findings): 

y(k + 1) = CAC^'y{k) -f CBuu{k) -F CAC-^'y(k), 

with the initial condition ^(0) = Cx{0). In order to describe an asymptotic 
observer by this equation, it is necessary for the matrix CAC^ to be stable. 

Now we are going to modify the preceding procedure so that the state- 
transition matrix of the observer is an affine transform of a certain matrix 
parameter. Such a setting, along with the fulfilment of the detectability of 
the respective matrix pair, renders the stabilisation of this observer in terms 
of shaping a diminishing-to-zero trajectory of the observation error possible. 



^ In view of the knowledge of the measured output y(k). 
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To that end, let us introduce another subject output signal w{k) G as 

the following affine variant of the auxiliary signal y{k): 

w{k) = y{k) — KoCx{k), 

where Kq G is a free matrix parameter. With this it is clear that 

w{k) = {C-KoC)x{k). 

Now the equivalent of (5.66) is as follows: 



y{k) 




r 

0 

X 

1 




' c ' 


w{k) 




I 

1 

1 




c 



The first factor of the right-hand side in the above is, independently of the 
submatrix Kq, an invertible matrix: 



p -1 


-1 


“ 


Im ^mx{n—m) 




Im ^mx{n—m) 


1 

g 

1 

1 




1 

g 

1 

1 



Hence, by utilising the inversion used in transforming (5.66) into (5.67), we 
acquire 



c+' c+' ' 


0 jYixi^n—m) 




y{k) 


-* 


Kq In—m 




w{k) 



- (C+' + C^'Ko)y{k) + C^'w{k). (5.68) 



The output signal y{k) is measurable. Thus it is sufficient to build an 
observer for the subject output signal w{k). By means of a treatment anal- 
ogous to the one presented above, we derive the following equation: 

w{k + 1) = {C- KoC)x{k + 1) (C - KoC) {Ax(k) -h Buu{k)) 

- {C -KoC)AC^'w{k) + {C -KoC)Buu{k) 

+ (C - KoC)A{C+' + C+'Ko)y{k). 



Hence we have the following observer-based solution for w{k) (confer (5.62)): 



w{k -t- 1) = A-u,w{k) -\- Bw 



u{k) 

y{k) 



(5.69) 



in which 



A^ = {C- KoC)AC+' e («-»") 



(5.70) 
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- [CAC+'] - Ko [cAC+'j = A - KoC, 



B^y ] e R("—)x (««+-), (5.71) 

B^u = {C- KoC)Bu 6 ^ 

By,y = {C- KoC)A{C+' + C+'Ko) € 

The evolution of the subject-output estimation error 
We{k) = w{k) — w{k) G 

for a given We{0) is described, similarly to (5.63), as follows: 

We{k “h 1) — -^w^e(,k^ • 

As results from (5.70), the necessary and sufficient condition for the 
stability of the subject observer (5.69) is the detectability of the matrix pair 
{CAC~^ , CAC^ ). In this context, the following fundamental question arises: 
What kind of a relationship exists there between the detectability of this pair 
and of the ‘original’ pair (A, (7)? The answer is expressed by means of the 
following theorem. 



Theorem 5.1. (On detectability.) The matrix pair {CAC^ ^CAC^ ) is de- 
tectable if and only if the pair (A, C) is detectable. 

Proof. A suitably defined similarity transformation will be used to show^ 
that the detectability of (A, C) is a necessary condition for the detectability 
of {A,C) = {CAC^ ,CA(7“^ ). The sufficient condition results immediately 
from an inversion of the following procedure. 

Consider an auxiliary matrix pair (A, C) = (r~^AT', (7T), where a non- 
singular similarity transformation matrix^ T G is defined as 






(5.72) 



Thus we have 



A = 



CAC+' CAC+' 
CAC+' CAC+' 






m ^mx{n—m) 



(5.73) 



Now let A be an eigenvalue of an unobservable mode of the system 
portrayed by the pair (A, C) and a similar pair (A, C_). This means that there 

® By using the law of simple transposition. 

^ Representing a change of the basis used for an equivalent representation of the state 
vector of the system. 
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exists a right eigenvector v e W of the matrix A for which Av = Xv and 
Cv = Om- The following form of this vector results from (5.73): v = and 
has a non-zero sub- vector v ^ On-m ^ . Hence we obtain the following 

formula: 



CAC+' ' 


V = 


' c ' 


y = 


Om 


CAC+' 




A 




Xv 



which shows that A is an unobservable eigenvalue of the pair {A^C) = 
as well. ■ 



The sought state observer of a minimal order has the form similar 
to (5.62): 

u{k) 



w{k -h 1) == Au)'w{k) + B 
x{k) = B, 



y(k) 



y{k) 

w{k) 



(5.74) 



where = [ B^y B^^ ] 6 B^y = C+' + C+' Ko € and 

B^y,=C+' 6 E"x {«-"*). 

A structural scheme of this observer is shown in Fig. 5.3. 




Fig. 5.3. State observer of a minimal order 

Since we assume the knowledge^^ of the output y{k)^ the only source 
of the estimation error is the uncertainty included in the applied evaluation 
of the initial conditions w{0). Therefore, let us now deliberate on the state 



Apart from the information about a precise model of the object. 
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estimation error Xe{k). Based on the definitions of Xe{k) and We{k), as well 
as in view of (5.68) and (5.74), we deduce that Xe{k) — We{k), which 
confirms the fact that for the reduced-order observer, Xe{k) G Im(C^) = 
Ker(C), VA;. 

How to tune such an observer? This is a classic problem of pole placement 
(stabilisation) for the pair (A, C) — {CAC^ , CAC^ ): We just seek a matrix 
parameter Ko € which assures that the matrix A^ will have 

predetermined eigenvalues. 



Characteristics 5.1. (Minimal-order observers.) 

1. The synthesis of the minimal- order observer for (5.56)-(5.57) can be 
greatly simplified when some of the object states are ‘directly^ available, 
i.e., when 



C 



=[ 



m ^mx{n—m) 



(5.75) 



It is clear from the above formula that - in order to facilitate further 
derivations - all the available states have been assigned to the first m 
coordinates of the state vector. As a result, the state vector is now divided 
into two subvectors x{k) G and x{k) G : 



x{k) = 



x{k) 




y{k) 


x{k) 




x(k) 



Thus we can accordingly infer that C = [ 0(n-m)xm In-m ]? y^hich 
immediately results in 



c+' = 


I'm 


, and C~^ = 


Hmx{n—m) 




0{n—m)xm 




In—m 



Now we can easily verify that 



x{k) = 



x{k) 




y{k) 


x{k) 




Koy{k) + w{k) 



where the matrices A^, B^u, CL^d Byjy, which parameterise the 
model (5.69)-(5.71) describing the evolution of the observer state vec- 
tor w{k), acquire the following form: A^ = A 22 — K 0 A 12 , B^ju = 
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Bu 2 - KoBui and B^y = A21 + A22K0 - KoAn - K0A12K0, in which 
An An] ^In € A12 £ 

A - , 

A 21 A 22 2 l 2 i 6 ^22 e 



D 

Bu= B„2 € 

Bu2 

2. The condition for the observer’s stability is the detectability of the pair 
(^ 225 ^ 12 )- the case considered, an error of the estimation of the ^sub- 
ject’ state subvector x{k) defined by x^{k) x{k) — x{k) is described as 

x^{k) = We{k). 

3. The simplicity of the above recipes encourages us to test their applica- 
bility also in cases in which the matrix C diverges from the form de- 
scribed by (5.75). To this end, the original matrix triplet (A,B,C) is 
transformed into a similar one, (T~^ AT,T~^ B , CT) , by using the trans- 
formation matrix T given by (5.72). After the prescribed determination 
of a state estimate XT{k) G MT of this model, the requested estimate x{k) 
of the plant state x{k) is re-calculated by means of inverse similarity con- 
version, leading to the original system of coordinates: x{k) = Txrik). 

Characteristics 5.2. (Operator forms of observers.) 

1. The full-order observer (5.62) is characterised by the following formulae: 
x{z) = {zin - Ao)~'^BuU{z) + (zln ~ Ao)~'^Koy{z) + z{zln ~ Ao)“^£(0), 



d^{z) = z{zln - Ao) ^Xe(O). 

2. For the minimal-order observer (5.74), turn, we can give the following 
description: 

x{z) = C+' {zin-m - A^yHC - KoC)B^u{z) 

+ {In + C+'{zIn-m ~ A^)-\C - KoC)A){C+' + C+' Ko)y{z) 
+ zC^ {zin-m - 

X Q (z^ — zC~^ {zin—m -^uj) W^e(O)* 



5.4.3. Observer matrix determination by pole placement 

As has been shown above, the task of full-order observer synthesis can be 
interpreted in terms of obtaining a matrix Kq G , satisfying a given 
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assignment of the observer’s eigenvalues: 



X{A^KoC) = {Xi}U, (5.76) 

in which^^ complex eigenvalues appear solely in conjugate pairs. 

For an arbitrary set of eigenvalues the matrix Kq exists iff^^ the 
matrix pair {A, C) is completely observable. The latter, in turn, is 
satisfied if and only if (Barnett, 1971; Brogan, 1991; Kailath, 1980): 
{Av = Xv f\Cv = Om] ^ V — On, which can be explained as follows. If 
the pair {A, C) is not completely observable, which means that there exists 
3v ^ On ^ such that for a certain A we have Av = Xv and Cv — Om, then 
{A + KoC)v = Xv for any Kq G . This testifies that A G X{A + KqC) 
independently of Kq- Such an unobservable mode/eigenvalue A cannot thus 
be changed with the aid of the feedback of the gain Kq. In that case, with 
the pair {A, C) being not completely observable, a solution Kq of the pole 
placement problem (5.76) exists iff the set of all unobservable eigenvalues 
associated with this pair is a subset of the set of the assigned eigenvalues of 
the observer state matrix Aq (Kailath, 1980; Wonham, 1979). 

For scalar measurements (m = 1), if the assignment (5.76) has a so- 
lution, it is a unique one (Petkov et a/., 1991). In a multiple case with 
1 < m < n, the solutions - if they exist - are not unique. Hence the choice 
of a specific solution requires specifying^ ^ of additional criteria as the basis 
for suitable decision making (Patel and Misra, 1984; Wonham, 1979). Such 
criteria can, for instance, concern the robustness of numerical procedures for 
finding Kq, which has an extra-ordinary meaning in cases when the plant- 
model parameters {A, C) are subject to perturbations (Byers and Nash, 1989; 
Dickman, 1987; Gourishankar and Ramer, 1976; Kautsky et al, 1985; Tits 
and Yang, 1996). 

In the presence of many available methods of solving the basic problem of 
pole placement /assignment (5.76), and as a consequence of the revealed pos- 
sibility of the optimisation of the designed linear dynamic system (connected 
with the above-mentioned non-uniqueness of design), the issue of a suitable 
parameterisation of the set of the gains Kq is of great importance. Such a 
parameterisation, whose main purpose is to facilitate the procedure of seek- 
ing optimaP^ solutions for Kq, can be effectively based on the methodology 
of shaping an eigenstructure of the observer matrix Aq. Precisely speaking, 
such a design affects not only the eigenvalues, but also the eigenvectors of 
this matrix. And, in particular, independence in modeling the eigenvectors 
can be ascribed to a rational utilisation of the design freedom/non-uniqueness 
(Andry et aL, 1983; Liu and Patton, 1998; Sobel et a/., 1996; Weinmann, 1991; 
Zagalak et a/., 1993). 

For practical reasons concerning the system’s realisability. 

An abbreviation for ’if and only if’. 

See also Section 5.2. 

In a certain sense. 
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Searching for an optimal eigenstructure of the observer state-transition 
matrix can easily adopt certain goals of the synthesis of detection observers 
(Kowalczuk and Suchomski, 1998; 1999; Liu and Patton, 1998; Patton and 
Chen, 1991b; Suchomski and Kowalczuk, 1999). In the next chapter of this 
book we shall present rules of synthesis established on a parameterisation of 
the so-called attainable eigensuh spaces of the state transition matrix of such 
an observer. Thus, below we shall only give a general blueprint on how to 
parameterise the set of solutions of the pole placement task, and, in addition 
to this, we shall show two elementary methods for scalar plants. 

To the assumed set of eigenvalues we can assign a corresponding 

set of suitable left eigenvectors: if Aq — Xilf, k ^ On ^ i € {1, . . . ,n}. 
Let = diag{Ai}p_j^ G denote a diagonal matrix built of these eigen- 
values, and let Ln — [ h • • • /^ ] G be a matrix whose columns are 

composed of their respective left eigenvectors. Hence we have AfLn = LnAn- 

Ordinarily, just for the above-mentioned robustness purposes, we assume 
that the designed matrix Aq is diagonalisable {non-defective) ^ of a simple 
eigenstructure (Kautsky et al, 1985; Tits and Yang, 1996; Weinmann, 1991). 
This, among other things, means that the matrix Aq has n linearly in- 
dependent eigenvectors (Chatelin, 1993; Meyer, 2000; Wilkinson, 1965). By 
accepting this assumption^^, however, one has to agree to a certain restric- 
tion imposed on maximal (arithmetic and geometric!) multiplicities of the 
assigned eigenvalues. Consequently, by treating the matrix as prefixed, 
and the non-singular matrix Ln as a parameter, we can state a simple lemma 
about the existence and general form of the pole placement problem (5.76), 
which assumes now the following form: 

LliA + KoC) = AnLl. (5.77) 

Yet, before we do this, let us introduce the following convenient repre- 
sentation of the observation matrix C: C = [ Wc Omx{n-m) ][ Y. V 

where Wc € is a non-singular matrix, while the submatrices V_ G 

£nxm Y ^ £nx(n-m) compose the orthonormal factor of the matrix C. 
On the assumption about a full row rank of this matrix, such a representation 
does always exist. It can be gained, for instance, by means of the svd decom- 
position: C = U[ T, Omx(n-m) ][ Y V ]^, in which U G is an 

orthogonal matrix, while E G denotes a diagonal submatrix, having 

all its diagonal elements positive (being the singular values of the matrix C ) . 
On the basis of this, we immediately obtain Wc = UT,. 



Lemma 5.1. (About the existence and form of the solutions of the pole 
placement problem.) 



15 



As will be done in further deliberations. 
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(i) If the diagonal An and non-singular Ln matrices are given, then the 
solution Ko of the pole placement problem (5.76) exists iff 

{LlA - KLl)V = 0„x(n-m). (5.78) 

(a) Moreover, the solution has the following form: 

Ko = (L-^AnLl - 



Proof. The pole placement formulation (5.77) yields the following equation 
to be fulfilled by Kq'. 

KoC = L-^AnLl - 

After the (right) post-multiplication of the above hy [ V_ t?" ], we retrieve 
two equivalent equations: 

KoWc = {L-^KnLl - A)V, 

i^n AnL^ A)V = Onx{n—m)j 

from which both statements of the lemma result immediately. 

As a commentary to the above, we can assert that the necessary and 
sufficient condition for the existence of Kq is the fulfilment of the inclusion 
lm{LnAnL~^ — A^) C Im((7^) = Im(y). This is equivalent to the demand 
for lm{LnAnL~^ — A^)±Kev{C) = Im(F), which is just expressed by (5.78). 
Moreover, let us note that the latter can be equivalently shown as V^{A^ — 
Xiln)li = On, 2 G {1, . . . ,n}, hence we have the necessary condition for each 
attainable left eigenvector: li G Ker{V'^ (A^ — A^/n))- Since for a completely 
observable pair {A, C) the dimension of the above null subspace is m, so is 
the maximal admissible multiplicity of any eigenvalue A^. ■ 

To illustrate the preceding debate, let us discuss a particular solution 
of the pole placement problem for the case of state observers of dynamic 
objects with a scalar output (m = 1). The model of such a plant, having a 
state vector x{k) G and an output y{k) G M, has the form 



x{k -f 1) = Ax{k), 
y(k) = c^x{k). 



(5.79) 



The corresponding full-order state observer can thus be described as 

x{k -f 1) = (A — ko(A)x{t) -h koy{k), (5.80) 

where x{k) G MP denotes a state estimate, and ko G BP is a parameter vector 
of feedback gain. With the assumed complete observability of the pair (A, c^), 
resulting in the invert ibility of its corresponding observability matrix Mq G 
I^nxn, coordinates of the vector ko, which assigns to the observer state 
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matrix Ao = A — koC^ the assumed eigenvalues can be determined 

by following an elementary procedure. Such a procedure can be based on 
the method of Ackermann and plant-model transformation to the canonical 
observer form (Brogan, 1991; Fairman, 1998; Ogata, 1995; 1996). 

The resulting Ackermann formula asserts that 

ko = ip{A)M-^en, (5.81) 

where (p{A) G is an image^® of A in a mapping defined by the char- 

acteristic polynomial (/?(A) = det (A/n — A^) of the observer state matrix 
Ao, and = [ 0 • • • 0 1 G is a unity vector. The characteristic 

polynomial of the matrix Aq can thus be expressed as 

n n 

(p{\) = det (A7„ - Ao) = fj (A - Aj) = OjA*, a* £ ffi, a„ = 1. (5.82) 

i—1 i=0 

Consider now a method of observer synthesis in which one applies a sim- 
ilarity transformation assigning a similar pair of an observerable^’^ canonical 
form to the pair (A, c^) describing the observed plant. Let thus (A, c^) de- 
note such a similar canonical matrix pair, in which A is a transpose 

Frobenius matrix and c = Cn G is a unity vector: 



0 


0 


0 ••• 


—ao 




■ 0 ' 


1 


0 


0 ••• 


-ai 




0 


0 


1 


0 ••• 


-U2 


, C = 


0 


0 


0 


0 ••• 


an—1 




. 1 . 



The corresponding characteristic polynomial of the (open-loop) plant model 
represented by A has then the following form: 

n 

det (A/^ — A) = ^ a{X\ ai G M, an — 1. 

i=0 

As can be readily verified, the (closed-loop) observer matrix A - 
determined by a free-design vector • ^n-i G also 

has the transpose Frobenius structure, in which the consecutive elements of 
the last column take on the values (— a — fcj), i = 0, ... ,n — l. Hence it results 

That is, a matrix value taken on by this scalar mapping (p{\) in the matrix argument 
A. 

Which has also many other names, to mention the normal observer or observable 
canonical form, the series realisation, or the direct form I (see, for instance, Kowal- 
czuk, 1989, and the references therein). 
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that for the dynamic object described by the equations x{k + 1) = Ax{k) 
and y{t) = c^x{k) one can arbitrarily shape the characteristic polynomial 
of the sought observer-state transition matrix by selecting the coordinates of 
the vector ko. 

The prerequisite similarity transformation of a completely observable 
matrix pair {A^cF) into its similar image {A,c^) = {T~^ATo,c^To) of the 
observable canon can be portrayed by To = {WMo)~^ G , where W = 
M~^ is an inverse of an observability matrix Mq G of the pair 

(A, c^). It can be easily shown that is a symmetric upper anti-triangular 
matrix: 



ai 


^n— 2 


^n— 1 


1 


U2 


®n— 1 


1 


0 


^n— 1 


0 


0 


0 


1 


0 


0 


0 



appropriately composed of the coefficients of the characteristic polynomial 
of A: 

n 

det{XIn — A) = an = 1. (5.83) 

i=0 

With this it is clear that the following relationship is fulfilled: Tq = 

As has already been mentioned, the full-order state observer is described 
by (5.80). The coordinates of the gain vector of this observer can be deter- 
mined as ko — Toko, where ko = [ ao - ao • • • - Un-i V ^ is 

a residue vector, resulting from the comparison of the characteristic poly- 
nomial^® of the state observer (5.82) with the one of the object (5.83). As 
has been stated above (5.81), the necessary and sufficient condition for the 
applicability of the presented method of observer synthesis is the invertibility 
of the observability matrix Mo of the pair {A,c^). 

5.4.4. Detection observers of the Luenberger type 

Consider a linear dynamic system R{r, yr,u, y) characterised by the following 
equations: 



r{k + 1) = Arv{k) -f BruBuu{k) + Bryy{k), (5.84) 

yr{k) = Crr{k) + Dryy{k), (5.85) 

where r{k) G E’^"’ denotes a state vector of the system, yr{k) G E"^^ is 
a system output, and the signals u{k) G E^“ and y{k) G E"^ are de- 
fined analogously to those of the system model P{x,y,u) described in (5.56) 

That is, the characteristic polynomial of the state transition matrix of either the 
observer or the plant. 
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and (5.57). The matrices of the system R{r,yr,u,y) have correct dimen- 
sions: Ar £ Bru € Bry £ E^- Cr £ E"^- and 

Dry £ Furthermore, we assume that the matrix Cr is non-zero. 

Our task consists in determining appropriately the distinguished ele- 
ments of the system R{r,yr,u,y) so that, for an assumed weighting matrix 
Wr £ E"^^ the output yr converges to a weighed state WrX of the process 
P(x, 2 /, u), independently of the applied input u and the process state i.e., 
limjfe_^oo 2/e(^) — Orur') for ye{k) being now the following residual vector: 

y^(k) = yr{k) - Wrx{k) G E"^^ . (5.86) 

Note that Wrx{k) G E"^^ can be interpreted as an output of a certain 
hypothetic linear static process Wr (a referencing object) excited by x{k). 
Consequently, such a task can be treated as an equivalent problem of de- 
signing an asymptotic ‘output observer’ (not the state one!) for a suitably 
defined system. A prospective solution R{r,yr,u,y) to this problem is re- 
ferred to as the observer of Luenberger (1966). The practical serviceability of 
the above design formulation lies in two facts: (i) the order Ur can be lower 
than the order n of the ‘original’ system P{x,y,u), and (ii) the condition 
for the existence of the output observer can be more easily fulfilled than the 
condition of the existence of the state observer (the complete observability of 
the pair (A, C)). An additional advantage of this approach is the possibility 
of performing beneficial parameterisations. 

It is thus of great importance that the Luenberger observer has all of 
the above-mentioned attributes (Chen and Patton, 1999; Fang et a/., 2000). 
Therefore let us consider some structural demands that have to be fulfilled 
by the elements of the system R{r,yr,u,y) functioning as a Luenberger ob- 
server. To this end, let us define a subsidiary state-estimation error vector 
as 

er{k) = r{k) - Brux{k) G E^^ , (5.87) 

for which, on the basis of (5.56), (5.57) and (5.84), we obtain Cr{k -f- 1) = 
Arr{k) -h {BryC — BruA)x{k). 

By stipulating a demand BryC — BmA = —ArBm, we get the following 
error equation: er{k + 1) == Arer{k). Hence there occurs an obvious stability 
requirement for the matrix Ar which, for any initial condition 6^.(0), as- 
sures that the evolution process of the error vector er{k) converges to zero: 
liuik-^oo er{k) = On^. Considering, in turn, the equations (5.57) and (5.85) 
yields the formula yr(k) = CrCrik) -h {DryC -h CrBru)x{k), on the basis of 
which (5.86) we easily derive another demand for the elements of the sys- 
tem R{r,yr^u,y) as an observer of the process Wrx{k). Namely, we state 
that the equality DryC + CrBm = Wr should be valid. The above delibera- 
tions constitute principal reasoning for the following lemma on the necessary 
conditions for the existence of the Luenberger observer. 




5. Control theory methods in designing diagnostic systems 



193 



Lemma 5.2. (On the properties of the elements of the detective Luenberger 
observer.) The matrices of the linear dynamic system R{r,yr,u,y), described 
by (5.84)-(5.85) and being an asymptotic Luenberger observer of the process 
Wrx{k)j have to fulfill the following conditions: 



\{Ar) C V{z), 


(5.88) 


BruA ArBru — BryC, 


(5.89) 


DryC -f- CrBru ~ W^r- 


(5.90) 



Let us now consider the possibility of using the Luenberger observer as 
a detection observer. To that end, we assume the following simple model of 
a system fault: 

x{k + 1) = Ax{k) + Buu{k) -f /(A:), (5.91) 



where 



/(fc) = 



Oji , k kj , 

f ^ On, k> kf. 



(5.92) 



We now expect that a suitable piece of information on this fault will 
appear in the residual signal ye{k), resulting in ye{k) ^ 0^^ for VA: > 
kf + kr, where kr is an admissible reaction time of the observer. Particularly, 
the latter parameter influences the shaped eigenvalues and the spectrum of 
the matrix Ar that determine the speed of transient observation processes. 
At this point, it is worth pointing to a fundamental difference between the 
common observer and the detection observer. In the first case, when we strive 
for the convergence \imk-^oo ^r{k) = On^, the quantity Wrx{k) - appearing 
in the output residue definition (5.86) - has to be treated as unmeasurable, 
i.e. as linearly independent of y{k). On the other hand, within diagnostic 
observation, it is the observer output residue ye{k), determined on the basis 
of the available (measurable) output y{k) of the diagnosed process P{x, y, u), 
that carries the most essential information. Therefore, in the latter case we 
can rationally assume a linear type of the process observation Wrx{k): 



Lry{k) — Wrx{k), 



(5.93) 



where Lr G is a certain matrix which models this access to y{k). 

From (5.57) and (5.93) it results that the matrix Wr takes on the following 
form: Wr = LrC. 

With the use of (5.84), (5.89) and (5.91), we can write an equation which 
illustrates the dependence of the subsidiary state-estimation error /residue 
vector er{k) on the fault f{k): 



6r{k -h 1) = ArCrik) — Bruf{k), 



(5.94) 
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Now, by taking into account (5.86) and (5.87) and (5.90), we achieve the 
following (external) relationship between the state estimation residue er{k) 
and the output residue ye{k): 



ye(k) - CrCrik). (5.95) 

The above lays the foundation for the following simple lemma. 

Lemma 5.3. (On the properties of the elements of the detective Luenberger 
observer.) The matrices of the linear dynamic system Re{r,ye,u,y): 

r{k + 1) = Arr{k) -h BruBuu{k) + Bryy{k), 

ye(k) = Crr{k) + Deyy{k), 

being an asymptotic detective Luenberger observer, have to fulfill the following 
conditions: 

\{Ar) C V{z), 

B^uA AfB'pn — BryC, 

DeyC^ “ 1 “ C^rBru — OmrXn^ 

where 

Dey^Dry-LreV^-^^. 



A fault f[k) described by (5.92) is asymptotically detectable if there 
exists a detection observer Re{r,ye,u,y) with lim^-^oo 2/e(^) 7^ Omr (Chen 
and Patton, 1999; Fang et ah, 2000). 

Lemma 5.4. (On a sufficient condition for the asymptotic detectability of 
a fault.) If the pair (A,C) is completely observable, then the fault f{k) is 
asymptotically detectable. 

Proof. For simplicity, let us consider the case of full-order observers: Ur — n 
and TUr = m. The proof has a constructive character, as we shall give the form 
of the detection observer. Let Ar = A—BryC, Bm = In, Cr = C, and Dey = 
—Im, and the matrix Bry G be chosen so that \{Ar) C v{z). The 

latter is always possible on the basis of the assumed observability of the pair 
{A, C) and consists in solving the corresponding task of the pole placement 
of the eigenvalues of the matrix Ar. As can be easily verified, the following 
formulae are now valid (confer (5.94) and (5.95)): e^(A:-|-l) ArCrik) — f{k) 
and ye{k) = Cer{k). Thus, in a steady state, the observer output arrives at 
the value —C{In — Ar)~^f. The assumed observability of {A, C) implies the 
observability of the pair {Ar, C), which, in turn, means that with f ^ On the 
vector C{In — Ar)~^f has to be non-zero (see Lemma 5.7 on observability 
given in Chapter 6). Finally, note that in the analysed case it holds that 
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In the next part of this subsection we shall consider the case of an asymp- 
totic Luenberger observer for a system P{x^y,u) of a bit more complex 
structure: 



x{k + 1) = Ax{k) + Buu{k) -f- Ef{k)^ (5.96) 

y{k) = Cx{k) + DMk) + Ff{k), (5.97) 

where f{k) G models a fault, while Du G ^ E G and 

F G . The presented study is of a concise structure, as its results can 

be easily obtained by following the above reasoning (Chen and Patton, 1999). 
The proposed asymptotic observer R{r,yr,u,y) of the process Wrx{k) G 
has the form 

r{k + 1) = Arr{k) + Bru'^(k) + Bryy(k), (5.98) 

yr{k) = Crr{k) + Druu{k) + Dryy{k), (5.99) 

where Bm G Dm G and Dry G We also assume 

that Cr / Omrxrir' Moreover, as has been done above, let us introduce the 
corresponding subsidiary state-estimation residue (5.87) as 

er{k) = r{k) — Trxx{k) G E’^^ , 

where the matrix Trx G E^^^^ represents a design parameterisation. 

Lemma 5.5. (On the properties of the elements of the Luenberger observer.) 
The matrices of the linear dynamic system R{r^yr^u^y), described by (5.98)- 
(5.99) and being an asymptotic Luenberger observer of the process Wrx{k), 
have to satisfy the following conditions: 



A(A^) C v{z), 


(5.100a) 


'L'rxA ArTrx ~ BryCj 


(5.100b) 


LrxBu BryDu — Bru^ 


(5.100c) 


DryC + CrTrx — 


(5.100d) 


Dm DryDu — OfTirXnu ' 


(5.100e) 



A sufficient condition for the existence of such an observer i?(r, yr, u, y) 
is the fulfilment of the complete observability of the pair (A, C) (Chen and 
Patton, 1999). Taking Wr = C (which also means = m) we can next 
define an estimate of the output y{k) of the system P{x,y,u) as 



y{k) = yr{k) + Duu{k) G E"". 
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A suitable observer output residue, being a basis for the detection of the 
fault f{k), will now be assumed in the form of a weighted error of the output 
estimation (see also (5.86)): 

yw{k)=W{y{k)-m) 

where W is a weighted matrix. 



Lemma 5.6. (On the properties of the elements of the detective Luenberger 
observer.) The matrices of the linear dynamic system Re{r,ye,u,y): 



r(k -fl) = Arr{k) + Bruu{k) -h Bryy{k), 


(5.101) 


yw{k^ — C wr^{k^ d~ D^nu(^k^ “1“ D>n)yy(^k^ ^ 


(5.102) 


being an asymptotic detection observer of the output y{k) 
P{x,y,u), have to comply with the following prerequisites: 


of the system 


C^r = -Wa 


(5.103) 


= -W{D^ + Dm) e , 


(5.104) 


D^y^W-WDry^W^^^, 


(5.105) 


and 




\{Ar) C V{z), Trx A - ArTrx = BryC, TrxBu ~ BryDu = 


= Bru, (5.106) 


DnjyC -f~ CiDrl^rx — Hwxn^ 


(5.107) 


DwU d~ DyjyDii Du^XHu ’ 


(5.108) 



A structural scheme of the detective Luenberger observer is shown in 
Fig. 5.4. 

On the basis of the formulae (5.100b), (5.100c) and (5.101), we can derive 
(see (5.94)) the following equation of the evolution of the state estimation 
error er{k): 

6r{k + 1) Ar6r{k) + [Bj-yF — TrxF) f {k) , (5.109) 

where the fault-modelling signal f{k) evidently acts as excitation. 

Similarly, taking into account (5.102) and (5.107)-(5.108), we can easily 
see that the information on the fault can show in the weighted observer- 
output residue yw{k) in both a direct and an indirect way, as we have the 
following internal representation of this vector (see (5.14) and (5.95), for 
instance): 

yw{k^ — Cwr^ri^k^ d” FyjyFf{k^> 

Consider now a simple example of a detection observer of a full order. To 
this end, we assume that Trx — In and Cr = C, and in the equation (5.101) 
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fik) 




Fig. 5.4. Detective Luenberger observer 



we put Ar A — BryC, and Bm — B^ — BryDu, being careful to assure 
the stability of the state transition matrix Ar by a suitable choice of the 
matrix Bry G . With the postulated complete observability of the pair 
{A, (7), the latter boils down to solving of a respective eigenvalue assignment 
problem for the matrix Ar. Moreover, as can be easily validated, a zero 
matrix Dry = Omxm corresponds with Dm — Omxnu^ which means that 
now the following simple variants of the formulae (5.103)-(5.105) are valid: 
= -wc e D^u = -WDu G and D^y = W e 

From the equations (5.101) and (5.102) there results thus the following 
operator representation of the detective Luenberger observer (the transient 
term originated from the non-zero initial conditions r(0) G has been 
neglected here): 

y^{z) = G^un(z)u(z) + Gu,yiz)y{z), 

where u{z), y{z) and y^{z) are discrete transforms of respective signals, 
while 



-^r) Bru D 
^ ^ wri^^^Tir -^r) Bry “h D 



wui 



wy 
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Thus, in the analysed case of the full-order observer, we obtain 
Gwu = —W (C {zin — Ar) ^ {Bu ~ BryDu) + D^) , 

Gwy{z) - -W{C{zIn- Ar)~^Bry - Im)‘ 

Now, on the basis of (5.108) and (5.109), we derive the following operator 
formula, which describes the way of exposing the fault in the weighted residual 
signal: 

y^{^) = Gwf{z)l{z), 

where 

Gyjfi^) — Gwri^Irir ~ Ar) ^{BryF — TrxE) -f- D^yF. 

In the case of full-order observers, this transfer function acquires the 

form 

G^f(z) - -W{C{zIn - Ar)-\BryF - E) - F) . 

The estimation of the minimal admissible order of the detective Luen- 
berger observer can be found in (Chen and Patton, 1999; Mironovski, 1979; 
1980). It is also worth emphasising that since for a given dynamic system 
described by an input-output model (like a transfer function) we can use 
minimal canonical observable realisation in the state space, the detection ob- 
server for such a model does always exist. Within such practical courses of 
detective filters synthesis the designer’s attention is thus not focussed on the 
issue of the existence of the observer but on the problem of its quality. 

In particular, in the simplest cases of asymptotic detection, we should 
at least attempt to assure a possibly high value of a static gain G^;/(l). In 
more complex cases (of isolability), a ‘directional structure’ of this gain is of 
importance (see Subsections 5.2.1 and 5.2.3). Also another degree of design 
freedom, associated with the possibility of shaping the dynamic properties 
of the detector Gyjfiz) via a suitable choice of a weighting matrix transfer 
function W = W{z), according to the classical methods of the frequency- 
domain correction of dynamic systems, is also noteworthy. 



5.5. Linear Kalman filters 

One of the most effective tools used in the automatic control and diagno- 
sis of dynamic processes is the Kalman filter (Kalman, 1960; Kalman and 
Bucy, 1961), being a state-space representation of the Wiener filter (Wiener, 
1950). Diverse conceptions and solutions to the problem of Kalman filter- 
ing (continuous-time, discrete-time, time-invariant and time-variant) can be 
found in the standard textbooks on estimation theory (Anderson and Moore, 
1979; Brown and Hwang, 1992; Davis, 1977; Gelb, 1988; Jazwinski, 1970; 
Kailath et al, 2000; Lewis, 1986; Meyback, 1979). 
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A Kalman filter is an algorithm used for estimating a vector of state vari- 
ables of dynamic objects, which are subject to stochastic disturbances and 
modelled in state space, on the basis of noisy measurements of certain output 
quantities. Such an algorithm utilises in an intelligent way the entire informa- 
tion on the system’s dynamic, deterministic control signals and probabilistic 
characteristics of both stochastic disturbances, affecting the object’s state 
variables, and measurement noises, perturbing the results of observations of 
the object outputs. 

As has already been mentioned, the Kalman filter (introduced at the turn 
of the 1950s and the 1960s) is an instance of the Luenberger observer (1966) 
optimised with respect to stochastic excitations. For the reader’s convenience, 
elementary ideas and the knowledge of the basic facts relating to the issue of 
linear Kalman filters, which are introductory to the contents of this section, 
can be found in Appendix at the end of this chapter. 

5.5.1. Models of estimated processes 

The fundamental discrete-time process used in this subsection is described 
by both the following linear state equation: 

x(k + 1) = A{k)x{k) + B{k)v{k), (5.110) 

where x{k) G denotes a state vector and v{k) G is a vector repre- 
senting a white-noise sequence of a zero mean value (that is generic for this 
discrete process) , and by the following linear equation of the vector observa- 
tions y{k) G W^: 

y(k) — C{k)x{k) + w{k), (5.111) 

with w{k) G describing a discrete-observation white-noise sequence of 
a zero mean. The above processes v{k), w{k) and an initial state x(0) are 
characterised as follows: 

/ \ S{k)5ki Opxm Op 

( 1(0 S{k)'^^ki R{k)8ki Omxm Om ■ (5.112) 

^( 0 ) ^ Onxp Ojixm -^( 0 ) On 

In the above we presume, among other things, that initial state of the zero 
mean and the covariance X(0) is uncorrelated with the two processes. Prom 
the equations (5.110)-(5.112) there result the following properties of the 
model. 

Characteristics 5.3. (Stochastic process model.) 

1. The processes v{k) and w{k) are uncorrelated with all the previous states 
and with the current state: 



v{k)J-x{i), w{k)±x{i), i < k. 



(5.113) 
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2. The processes v{k) and w{k) are uncorrelated with all the previous 
observations: 

v{k)±y{i), w{k)±y{i), i < k — 1. (5.114) 

3. As regards the current observation, we have 



{v{k) I y{k)) - S{k) e (w{k) I y{k)) = R{k) G (5.115) 

4 . The state covariance {x{k) \ x{k)) — X(k) G fulfils the following 

recursive relation: 

X{k + \)=A{k)X{k)A{kf + B{k)Q{k)B{kY', k>0, (5.116) 

for a given initial condition X(0). 

5.5.2. Linear Kalman filtering founded on innovations 

Let x(k\k — 1) G MP' , being a projection of the state vector x{k) on a linear 
subspace L{y{i),i < A: — 1} spanned by the observations {^/(O), . . . ,y{k — l)}, 
denote a one-step optimal minimum-variance prediction of this state vector 
derived on the basis of these observations. An error of such an estimation of 
the state x{k) can be defined as 

x[k\k — l) = x{k) — x[k\k — l) G (5.117) 

while its respective covariance matrix is settled as P{k\k — 1) = {x{k\k — 1) | 
:r(A:|A:-l)) G 

As can be easily verified, the following inclusions are valid: x{k\k — 1) G 
L{x{0)]y{i),i < A: — 1} C L{x{0)]v{i),w{i),i < k — 1}, hence we conclude 
that u(A:)J_x(A:|A; — 1) and u;(A;)_Lx(A:|A; — 1). On the other hand, from the 
definition of the estimate x(A:|A: — 1) it results that the vector x{k) can be 
represented in the form of a sum of two orthogonal terms (see (5.117)): 

x{k) = x[k\k - l) + x(A;|A: - l), x[k\k - l)±;r(A:|A: - l). (5.118) 

By projecting y{k) on L{y{i),i < k - 1} we acquire 

^(A:|A; — l) = C{k)x[k\k — l) -h w[k\k — l) . 

Taking into account the fact that for Vi < A: — 1 it stands that w{k)±y{i), 
we have w{k\k — 1) = Om- On the above basis, we can write an affine relation 
between the innovation e{k) and the one-step prediction x{k\k — 1) of the 
state 



e{k)^y{k)-C{k)x{k\k-l) 



(5.119) 
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where e(0) == Z/(0). The covariance of the innovation process is denoted by 
{e{k)\e{i))=T{k)SkieW^^^. 

FVom (5.111), (5.117) and (5.119) it results that 

e(k) = C{k)x[k\k — l) + w{k). (5.120) 

By virtue of the above we obtain 

T{k) - C{k)P{k\k - l)C{kf + R{k). (5.121) 

In the forthcoming deliberations we assume a positive definiteness (and 
hence also invertibility) of the matrix T{k) > 0. This assumption can be^^ 
fulfilled by presuming a positive definite matrix R{k) > 0. 

Consider now the task consisting in the determination of recursive rela- 
tionships that bind the next one-step prediction x{k + l\k) of the state with 
its current prediction x{k\k — 1). To this end let us write the formula of an 
optimal estimator (5.A10), described in Appendix, in the following form: 

k 

x[k-\-l\k) = (x{k + 1) I e{i))T{i)~^e{i) 

i=0 

k-i 

= {x{k + 1) I e{i))T{i)~^e{i) + {x{k + 1) | e{k))T {k)~^ e{k) 

i=0 

= x{k + l\k - 1) + {x{k + 1) I e{k))T{k)-^e{k). 

By projecting x{k + 1) on L{y{i),i < k — 1} and observing that for 
Vz < A: — 1 it is fixed that v{k)±y{i), we obtain 

x[k + 1|A: — l) = A{k)x[k\k — l) + B{k)v[k\k — l) = A{k)x[k\k — l). 

Hence the sought recursive relationships acquire the form 

( e{k) = y{k) - C{k)x{k\k - 1), e{0) = y{0), 

^ (5.122) 

[ x{k + 1|A:) = A{k)x{k\k - l) + {x{k + 1) | eik))T{k)-^e{k), k>0, 

or, equivalently, 

:r(A: + l|A:) = {A{k) - {x{k + 1) \ e{k))T{k)-^C{k)) x{k\k - l) 

+ {x{k + 1) I e{k))T{k)~^y{k), f (0| - l) = On, k> 0. 

It thus remains to determine the quantity {x{k + 1) | e(A;)). Prom (5.110) 
it results that 

(^x{k + 1) I e{k)) = A{k)(^x{k) \ e{k)) -h B{k){v{k) \ e{k)). 
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By implementing a postulate of thorough modelling. 
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The internal product appearing in the first term of this sum obtains the 
structure 

(x{k) I e{k)) = (x{k) | C{k)x[k\k — l) H- w{k)) 

= (x{k) I x[k\k — l))C{k)'^ + {x{k) \ w{k)) = P[k\k — l)C(k)'^ . 

The second of the examined internal products can be computed by the fol- 
lowing method: 

{v{k) I e{k)) = {v{k) | C{k)x[k\k — l) + w{k)) 

= (v{k) I x[k\k — l))C{k)'^ -h (v{k) \ w{k)) = S{k). 



Hence it makes 

{x{k -f 1) I e{k)) = A{k)P{k\k - l)C{kf + B{k)S{k). (5.123) 

The problem can thus be reduced to the recursive determination of the co- 
variance matrix P{k\k — 1). Letting 

K^{k) - {x{k + l)\e{k))T{k)-^ 

= {A{k)P{k\k - l)C{kf -h B{k)S{k)) T{k)-^ e 

the evolution of the estimate — 1) can be described by the following 

state equation: 

x[k -h 1|A:) = A{k)x[k\k — l) + K-^{k)e{k), f (0| — 1) = On, k > 0, 

where the innovation e{k) appears to be the generic white noise of the 
process. Consequently, the covariance matrix {x{k\k — 1) | x{k\k — 1)) == 
X{k\k — 1) G fulfils the succeeding recursive relationship (compare 

(5.116)) for A: > 0: 

X(k + l\k) = A{k)X(k\k-l)A{k)'^ -\-K+{k)T{k)K^{k)'^, 

(5.124) 

X(0| - 1) = Onxn. 

Alternately, from (5.117) it results that X{k) = X{k\k — 1) -f- P{k\k — 
1). By considering (5.116) and (5.124), we gain the standard formula of a 
discrete-time Riccati equation: 

P{k + l\k)= A{k)P{k\k - l)A{kf + B{k)Q{k)B{kf - K+{k)T{k)K+{kf . 

The above leads to the so-called innovative version of the Kalman filter 
(Anderson and Moore, 1979; Gelb, 1988; Kailath et a/., 2000; Lewis, 1986). 
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Corollary 5.1. (An innovative version of Kalman filters.) Consider the 
model (5A10)-(5.112) of the estimated process of x{k). A recursive scheme 
of determining the one- step state prediction based on the innovation e{k) of 
the process of observing y{k) is given as 

e{k) = y{k)-C{k)x{k\k-l), x{0\-l)^0n, e(0) = 2/(0), (5.125) 



i(A; + l|A;) = A{k)x[k\k — l) + K+{k)e{k), k>0, (5.126) 

K+(k) = {A(k)P{k\k-l)C{kf + B{k)S{k))T{k)-\ (5.127) 

T{k) = C{k)P{k\k-l)C{kf + R{k), (5.128) 

P{k + l\k) = A{k)P{k\k-l)A{kf + B{k)Q{k)B{kf 

- K+{k)T{k)K+{kf, P(0| - 1) = X(0). (5.129) 



As a complement to the above results we propose the following five 
commentaries: 

(1) It is assumed that the matrices A{k), B{k), C{k), Q(k), R{k), Sk 

and AT(0) are known a priori. The matrices P{k\k — 1), and T{k) 

are deterministic quantities (and do not depend on the current observation 
data) and, as such, can be determined off-line. 

(2) The initial condition P(0| — 1) = A'(O) results from the assump- 
tion that f (0| — 1) = On, as in such cases P(0| — 1) = (x(0) — x(0| — 1) | 
x{0) - f (0| - 1)) - (:r(0) | x(0)) = X(0). 

(3) The equation (5.129) can also be derived in a somehow different 
way. From (5.120) and (5.126) it results that x{k + 1|A;) = A{k)x{k\k — 1) + 
K-^{k){C{k)x{k\k — 1) -f w{k)). Moreover, by taking into account (5.110)- 
(5.117), we obtain the following affine relation, in which there are given ex- 
plicitly two externally perturbing processes: 



x{k + l\k) = {A{k) - K+{k)Cik))x{k\k-l) + [ B{k) -K+{k) 



v{k) 

w{k) 



On the above basis we achieve a recurrent equation 
P{k + l\k) = {A{k)-K+{k)C{k))P{k\k-l){A{k)-K+{k)C{k)f 



+ 




-K+{k) 



r Q{k) 


S{k)' 


■ B(kV ■ 


[5(fc)^ 


R{k) 


-K+{kr_ 



(5.130) 



which - with the use of (5.127) - can be easily simplified to the form (5.129). 
The formula (5.130) has evidently a more general sense than (5.129) since 
it allows the application of an arbitrary (non-optimal) gain of this es- 
timator. Note that it is the assumption of the value of K^{k) given 
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by (5.127) that guarantees the orthogonality of the state-estimation er- 
rors: x{k\k — l)±x{k\k — 1), and, consequently, assures the suitability of 
the recipe (5.129). 

(4) The equation set (5.125)-(5.126) can be interpreted as the so-called 
innovation representation (model) of the observation process for y{k)\ 

x{k + 1|A;) A{k)x{k\k - l) -h K^{k)e{k), :r(0| - l) = On, 

y{k) = C{k)x[k\k — l) + e{k), k >0, 

where the position of the process of generating and disturbing the obser- 
vations is taken by the white innovation process e{k). This representation 
makes a causal and causally invertible model, as we have 



x{k + l\k) = {A(k)-K^{k)C{k))x{k\k-l)+K^{k)y{k), x(0| - l) = 
e{k) = —C{k)x[k\k — l) + y{k), k >0. 



The latter means that based on the knowledge of y{k) we can retrieve e{k). 
It is noteworthy that being causal the system (5.110)-(5.111) need not be, in 
general, a causally invertible model since the reconstruction of v{k), w{k) 
and a:(0) on the basis of y{k) can appear unfeasible. Moreover, let us notice 
that the innovative representation of the observation process for y{k), by 
specifying a lower amount of parameters, makes a rather thrifty description 
as compared to the fundamental system model (5.110)-(5.111). 

(5) Eliminating from the derived formulae the explicit presence of inno- 
vations leads to the following forms of the so-called observer structure of the 
Kalman filter: 

x[k + 1|A;) = A^{k)x[k\k - l) + K-^{k)y{k), :r(0| - l) = On, k > 0, 



where 

A+(A:) = A{k) - K^{k)C{k) G 
P{k-^l\k) = A^{k)P{k\k-l)A+{kf 

Bikf 

-K^ikf 



+ 



[ B{k) -K+{k) ] 



Q{k) S{k) 
S{kf R{k) 



The discussed algorithms of the Kalman filter are based on the recur- 
rent processing of the optimal^^ one-step predicted estimate x{k\k — 1) of 
the state x{k) without discerning the phase in which we determine the op- 
timal current (filtered or a posteriori) estimate x{k\k) of the state x{k). 



20 



In the sense of linear-least-mean- squares, L-LMS. 
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Let us therefore consider now the following problem defined for the basic 
model (5.110)-(5.112). Given the optimal one-step prediction x{k\k — 1) of 
the state x{k) and the current observation y(k), let us determine the optimal 
current estimate x{k\k) and its covariance characteristics P{k\k) = {x{k\k) \ 
x{k\k)) e of the state estimation error: x{k\k) = x{k) — x{k\k) G 

by performing a suitable update of the vector x{k\k — 1) and the matrix 
P{k\k — 1) on the basis of the current observation y{k). 

According to the prescription (5.A10) of Appendix we can write down 
the output form of the optimal estimator: 

k 

x[k\k) = ^^(^x{k) \ e{i))T{i)~^e{i) 

i=0 



k-l 

— I + (^(^) I e{k))T{k)~^e{k) 

i=0 

= x{k\k - 1) + (x{k) I e{k))T{k)-'^e{k). (5.131) 

We should also emphasise here that the principal problem has not yet 
been solved, as x{k\k) ^ x{k\k — 1) + {x{k) \ y{k)){y{k) \ y{k))~^y{k). In the 
discussed case with (5.118) and (5.120) we have 

{x{k) I e{k)) = {x{k) | C{k)x[k\k - l) + w{k)) 

= {x{k) I x[k\k — l))C{k)'^ + (x{k) \ w{k)) 

= P{k\k-l)C{kf. (5.132) 

According to (5.131) and (5.132), the error of the estimate x{k\k) has the 
form 

x[k\k) = x{k) — x[k\k) 

= x{k) - x{k\k - 1) - K{k)e{k) = x{k\k - l) - K{k)e{k), (5.133) 

where 

K{k) = P{k\k - l)C{kfT{k)-'^ € 

By virtue of the above, we acquire 

P(^k\k) = {x[k\k - l) I x{k\k - 1)) - K{k){e{k) | x[k\k — l)) 

- {x{k\k - 1) I e{k))K(kf + K{k){e{k) \ e{k))K{kf. (5.134) 
With w{k)±x{k\k - 1) in mind, we derive an additional description: 
(e(fc) I x[k\k — l)) = (C(fc)x(fc|fc — l) + w{k) | x[k\k — l)) 

= C{k)P{k\k-l), 
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on the basis of which it results that 

K{k){e{k) \x{k\k-l)) = P{k\k - l)C{k)'^T{k)-^C{k)P{k\k - l) 

= (x{k\k-l) \e{k))K{kf. (5.135) 

The preceding deliberations immediately direct us to the following 
corollary. 

Corollary 5.2. (Updating state predictions in Kalman filters based on the 
current observations.) 

i) Given the model (5.110)-(5.112) of the estimated process x{k), the pre- 
dicted estimate x(k\k — 1) of the state x{k) is subject to updating based 
on the current observation y{k) according to the following: 

x[k\k) = x[k\k — l) + K{k) {y{k) — C{k)x[k\k — l)) . (5.136) 

a) The covariance characteristics of the error x{k\k) of this posterior esti- 
mate of the state x{k\k) are given as 

P{k\k) = P{k\k - 1) - K{k)T{k)K{kf 

= P{k\k - 1) - P{k\k - l)C{kfT{k)-^C{k)P{k\k - l). (5.137) 

In the deliberations presented overhead we have focused our attention 
on the innovative version of the Kalman filter. Complementary information 
on this subject, as well as on other concepts of Kalman filtration, can be 
found in an extraordinary wealth of the literature, which refers, for instance, 
to innovative filtering, factorised radical filters, covariance filtration and the 
linearisation of nonlinear state equations and/or observation equations, lead- 
ing to schemes referred to as extended Kalman filters (Anderson and Moore, 
1979; Bierman, 1977; Chui and Chen, 1989; 1991; Haykin, 1996; Kailath et 
a/., 2000; Lewis, 1986; Maybeck, 1979; Soderstrom, 1994). 

As a supplement to the presented discussion, it is worth formulating five 
commentaries, concerning mainly the probabilistic properties of the Kalman 
filters. 

(6) The solution presented as the Kalman filter is an optimal minimum- 
variance state estimator within the class of linear systems for arbitrary prob- 
ability density (distribution) functions of both the disturbing processes and 
the error of the initial state estimate. The latter distributions are only as- 
sumed to be known in terms of the first two (finite) moments. 

(7) When both the disturbing processes and the initial state error are 
Gaussian processes, the described estimator is also an optimal minimum- 
variance state estimator, while its linear framework is a consequence of 
the above Gaussian hypothesis and need not be assumed a priori. In 
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that case the state estimates can be interpreted as the conditional ex- 
pected values x{k\k — 1) = E[x{k)\y{Qi), . . . ^y{k — 1)] and x{k\k) — 
E[x(k)\y{0), . . . ,y{k)], suitably defined by means of the Gaussian condi- 
tional distributions p{x{k)\y{0), . . . ,y{k — 1)) — N{x{k\k — 1),P(A:|A: — 1)) 
and p{x{k)\y{0), . . . ,y{k)) = N{x{k\k),P{k\k)). Moreover, P(A:|A: - 1) and 
P{k\k) are both the unconditional covariances of the state estimation errors 
and the respective conditional covariances. The recursive determination of 
the quantities x{k\k — 1) and x{k\k) can also be interpreted as estimates 
that maximise the corresponding a posteriori probabilities (MAP). 

(8) In the above-mentioned non-Gaussian case, the state estimates 
x{k\k — 1) and x{k\k) supplied by the Kalman filter do not, in gen- 
eral, have the status of the conditional expected values: x{k\k — 1) ^ 
E[x{k)\y{0),...,yik-1)] and ^ E[x{k)\y{0), . . . 

(9) The minimum- variance state estimate x{k\k — 1) is characterised 
by the following orthogonality properties: the state observations and the 
errors x{k\k — 1) are uncorrelated. Let thus h{y{0), . . . ,y{k — 1)) de- 
note any function determined on the set (sequence) of all state obser- 
vations until the previous time moment k — 1. It is then clarified that 
h{y{0 ), . . . ^y{k — l))±{x{k) — E[x{k)\y{0), . . . , y{k — l)]). In the Gaussian case 
this means that the observations (as well as the function defined on them) 
and the estimation error are statistically independent, whereas in the non- 
Gaussian case the above orthogonality property is characteristic of the esti- 
mate x{k\k—l) on condition that h{y{0), . . . ,y{k—l)) is a linear observation 
function. With this, the then-existing uncorrelation of the minimum- variance 
linear-state-estimation error and the linear state-observation function does 
not imply any statistical independence of these quantities. 

(10) On a similar basis, when the Gaussian hypothesis does not hold, 
the innovations remain uncorrelated, although, in general, they are not sta- 
tistically independent. 



5.6. Summary 

In this chapter we have considered the issue of designing diagnostic systems 
on the basis of methodological tools existing in the field of control theory, and, 
in particular, in the area of the modelling and state estimation of dynamic 
objects. 

The widely-applied analytical redundancy method, based on a credible 
model of the monitored object with additive fault signals, allows processing 
a consistency model (guaranteeing the consistency or parity of the measure- 
ment data in a non-fault state of the object) and to implement a residue 
generator (signalising the fact of the existence of a fault in the object by 
means of non-zero residues). 
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It has thus been shown how - depending on the way of describing the 
diagnosed object (input-output, transfer function, state-space, deterministic 
or stochastic) - one can construct transfer function and state consistency 
relations, diagnostic observers and Kalman filters. 

The next two chapters of this book are dedicated to the optimal de- 
sign of detection observers with the purpose of decoupling the residues from 
disturbances and measurement noises, as well as from modelling and imple- 
mentation errors. 

As in this chapter we have focussed our attention on faults modelled 
additively, it is also worth mentioning here that there is another case of non- 
additive defects, where, for instance, the fault consists in a change of values of 
the observed object’s parameters. For the purpose of the detection, isolation 
and estimation of this type of errors, we can apply standard methods of 
parametric identification (Ljung, 1987; Soderstrom, 1994) and construct the 
residues as deviations of the identified parameters from their known nominal 
values (Isermann, 1984; 1989). 



Appendix 

In the material below we have gathered elementary notions, notations, defi- 
nitions and fundamental facts related to the issue of linear Kalman filters. 

Let X be a certain non-empty set, and an ordered pair (X, S) denote 
a linear space over a predetermined ring of ‘scalars’ S. This means that 
for the elements (vectors) in X we have defined the operations of addition 
X -\-y E X and multiplication by a scalar ax E X, \/x,y G X, Va G 5, and 
these operations are characterised by the following properties (Faith, 1981; 
Roman, 1992): x -\-y = y x, {x y) -h z = x {y z) , a{x-]-y) = ax-\-ay, 
{a + /3)x = ax /3x, {a/3)x = a{/3x), Ox = 0 and lx = x, where 0 and 1 
are the zero and unit elements of the ring 5, and ^ is a zero vector in X. 
In a ‘simplest’ case, S can be a field of real numbers S = R. Note that for 
a given X one can define various linear spaces, depending on the definition 
of the ring 5. With this, it is clear that, in general, multiplication in a ring 
S is not a commutative operation, which concerns, for instance, the matrix 
ring S = with matrix addition and multiplication. 

Let (-I*) denote an inner product defined on a space (X, S) that denotes 
a (bilinear) mapping (-I*) : X x X 5, which meets the two universal 
conditions: {ax /3y \. z) — a{x \z) ^{y \z) and {y \x) = {x 1 2/)*, where 

an involution * : S S embodies the following characteristics: (a*)* = a 
and {a^y = /3*a * . The definition of involution depends on the properties of a 
respective ring or field S: For example, the operation of matrix transposition 
is an involution within a real matrix ring. 

An ordered triple (X, 5, (• | •)) forms algebra referred to as a module. 
Such a space with S being a field is said to be an inner product space. An 
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analogous definition of the algebra (X, 5, (• | •)) can be given for the post- 
multiplication of a vector x £ X by a scalar a E S: xa £ X. 

Definition 5.A1. (Orthogonal vectors.) The vectors x,y E X are orthogo- 
nal, i.e., x±y, in an inner product space (X, 5, (• [ •)), if {x \y) = 0. 

In these deliberations we limit the class of inner product spaces to spaces 
with a definite inner product (Ben-Artzi and Gohberg, 1994; Bognar, 1974; 
Goldstine and Horowitz, 1966). To that end, by introducing a particular 
definition of spaces (X, 5) and by imposing additional restrictions on the 
inner product (• | •), we can differentiate between the two following cases 
of effective^^ spaces of unquestionably practical significance (Hassibi et a/., 
1999; Kailath et al, 2000): 

(PI) The space of scalar random variables (X, 5, (■ | •)), with 5 C M, is 
a Hilbert space of scalar real random variables (X C M) of a zero mean value 
with a (scalar) inner product {x \y) = E[xy] G M, which fulfils the following 
condition: {x \x) = 0 x = 0^ Vx G M. 

(P2) The space of vector random variables (X, 5, (• | •)), with S C 
is a Hilbert module of vector real (X C ) random variables of a zero mean 
value with a (matrix) inner product {x \ y) = E[xy'^] G such that 

{x \x) > 0, Va: G 

Let L{y(i),0 <i< N} = L he a, subspace in (X, 5, (-I*)) spanned by a 
vector set {y{i),0 < i < N}, y{i) e X. This means that any element x £ L 
can be represented as a linear combination of certain coefficients from S: 

N 

X = '^s{i)y{i), s(i) 6 5. (5.A1) 

i=0 

In a general case the notation x{k) G L{x{0)] z{i),i < j} means that 
the vector x{k) can be described as a linear combination of the vectors 
{a:(0), 2:(0 ), . . . , z{j)}, where also non-square matrix coefficients are allowed. 

Definition 5. A2. (Orthogonal projection on a subspace.) An orthogonal pro- 
jection of a; G X on L is a vector x G L for which the following orthogonality 
condition is characteristic: {x — i)J_L, which means that {x — x \ y) = 0 for 

Moreover, there is a principal lemma on optimal estimation (approxima- 
tion) in an arbitrary inner-product space^^, which is stated below. 

Lemma 5.A1. (Projection as the basis for optimal estimation in inner prod- 
uct spaces.) Let L C (X, 5, (• | •)). The projection of a vector x G X on L is 
characterised by the following extreme characteristic of the residue {x — x) : 

{x — X \ X — x) < {x — y \ X — y) for \/y G L. (5.A2) 

Normalised or Banach spaces with an inner product. 

Which need not be a Hilbert space. 
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Proof. As can easily be verified, the following identity equation holds: 



{x — y \ X — y) = {x — x + x — y\x — x-^x — y) 

= {x — x\x — x)-\-{x — y\x — y) 
-{-{x — y\x — x)-\-{x — x\x — y). 



Taking into account the fact that x, y E: L we also have x — y G L, 
and thus {x — x)±{x — y). Hence {x — x \ x — x) = {x — y\x — y) — {x — y\ 
X — y), which means that {x — x | a: — ^) < {x — y \ x — y). With this, it is 
worth mentioning that for S being a ring of square matrices, the notation 
s < r, for s, r G 5, denotes that the matrix r — seS is positive semidefinite 
r — 5 > 0, and thus x'^{r — s)x > 0, Va: G X. ■ 

Let X G G ^ > 0, and x represent the projection of x G 

on the subspace L{y{i),0 < i < k}. Since x G L{y{i),0 < i < k}, x can 
be shown as x = Koy, where Kq G is a matrix of coefficients, 

while y — [ ?/(0)^ • • • y{k)^ Y ^ denotes an L-designed column 

vector. Considering the product (x — Ky \ x — Ky)., for any K G : 



(x- 



1 

II 

1 


Rx 


Rxy 




In 


L J 


Ryx 


Ry 




-K 



(5.A3) 



where — {x \ x) E , Rxy = {x \ y) = Ry^ G g^^d 

Ry = {y \y) ^ ]^m(A.+l)xm(A.+l) are suitable covariance matrices of Gram 
in the sense of the assumed definition of the inner product, we immediately 
find out that the sought matrix Kq can be established as any solution of the 
following normal equation: 



KoRy = Rxy. (5.A4) 

This solution exists and is unique for > 0, as it then holds that 

Ko=^R^yR-\ (5.A5) 

Denoting the estimation error by x = x — K^y G 'MP' we have (x [ 
x) = Rx — RxyRy^Ryx ^ As it can be easily verified, this matrix 

is a Schur complement of the submatrix Ry of the Gram matrix of the 
row vector [ x^ y"^ ].\t can also be shown that the corresponding column 
vector results from the following linear mapping: 



X 




In 


1 

1 




X 


_ y . 






-^m(fc+l) 




. y - 



(5.A6) 



with its argument being a vector composed of two orthogonal subvectors 
{xLy). 
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The solution of the normal equation (5.A4) can be essentially simplified 
in the case of an orthogonal set of vectors, which span the analysed sub- 
space: The projection x is then a sum of all projections of the vector x 
on the orthogonal directions y{i), i G {0, . . . ,N}. A suitable computational 
procedure of the orthogonalisation of this vector set can be readily based 
on the classical algorithm of Gram-Schmidt (Brogan, 1991; Demmel, 1997; 
Kailath et al, 2000; Stewart, 2001). By proceeding in this way, we shall find 
a set of orthogonal vectors G , i G {0, . . . , A;}, for which it holds that 
e(z)J_e(j), Vz ^ j G {0, . . . , fc} and L{e{i),0 < i < k} = L{y{i),0 < i < k}. 
Such an approach has a special meaning in cases where consecutive data y{i) 
are supplied sequentially. It is thus purposeful to process them recursively, 
without the necessity of memorising and block processing all data gathered 
so far { 2 /( 0 , 0 <i<k} in each /c-th sampling instant (Anderson and Moore, 
1979; Kailath et al, 2000; Sbderstrom, 1994). 

The above-mentioned orthogonalisation consists in appending to the set 
{e(z),0 < i < k} an orthogonal vector e{k -h 1), carrying this part of new 
information included in the most recently obtained data y{k -f- 1) which 
cannot be derived on the basis of the correlation of the vector y{k -I- 1) with 
the previously considered data, that is, 

e{k + 1) = y{k + 1) - y{k -h 1|A;), (5.A7) 

where y{k -f 1|A;) G stands for a projection of y{k -f- 1) on the sub- 
space L{e(z),0 < i < k}. The process e{k) G defined by the above 
formula is referred to as the innovation of the process of the observation of 
y{k) (Anderson and Moore, 1979; Haykin, 1996; Kailath et al, 2000). From 
the definitions of the projection and orthogonality of the innovation vectors, 
which spans the subspace L{e(z),0 < i < k}, there results the following 
representation of this projection: 

k 

y{k + l|fc) = ^ {y(k + 1) | e{i)){e(i) \ e{i))~^e{i). (5.A8) 

2=0 



On the basis of (5.A7) and (5.A8) we obtain 

k 

e{k + 1) = 2 /(fc + 1) - ^ (y{k + 1) | e{i)){e{i) \ e(i))“^e(i), (5.A9) 

2=0 
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with the initial condition e(0) = y(0). The above prescription can be ex- 
pressed in the form of the following linear and invertible mapping: 



1 

O t-H 
1 




(2/(1) 1 


e(0)) (e(0) 1 e(0))-‘ 


OmXm 

Im 


’ * OmXm 
' ' OmXm 


.y{k ) . 




. {y{k) 


1 e(0)) (e(0) 1 e(0)>-^ {y{k) \ 


|e(l))(e(l)|e(l))-^ •• 


I'm 



e(0) 

e(l) 



e{k) 



Taking into account the fact that the matrix of this transforma- 
tion is block-triangular, we declare that the processes y and e = 
[ e(0)^ • • • e{k)'^ G are bound by a causally invertible re- 

lation. With this, the relationship e ^ ^ is referred to as the relation of 
modelling (shaping) the observation process for y based on the generating 
(generic) white noise e, while the reversed relationship ^ e is known as 
the whitening of the process of the observation of y. 

The above delibrations lead to the inference that the optimal {projected) 
form X of the vector a; G X is the following: 

k 

X — ^ (x I e{i)){e{i) | e(i)) ^e(i). (5.A10) 

i=0 

The latter prescription lays the foundation for effective recursive algorithms of 
estimation. Particular formulae of such algorithms, including the procedures 
of Kalman filtering (Anderson and Moore, 1979; Brown and Hwang, 1992; 
Maybeck, 1979), depend mainly on additional knowledge on the processes 
X and y. In the Kalman filter the knowledge is included both in the state- 
space model of the object and in the information on the probability density 
distribution functions of the modelled processes (Anderson and Moore, 1979; 
Haykin, 1996; Therrien, 1992). 

Finally, let us become aware of the fact that any application of the 
serviceable formula (5.A10) requires the assumption about the invertibility 
of all the matrices (e(i)|e(i)), Vi G {0, ...,A:}. Moreover, this assumption 
evidently means strong nonsingularity as compared to the requirement of 
the nonsingularity of the respective Gram matrix Ry of the equation (5.A4) 
(Anderson and Moore, 1979; Kailath et al, 2000). 
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6.1. Introduction 

Analytical methods of detecting faults and failures constitute a most essen- 
tial branch of current approaches to the technical diagnostics of dynamic 
processes (objects). A major difficulty encountered while solving the tasks of 
the synthesis of detection algorithms concerns the effective determination of 
multiple residues, referred to as residual vectors (Chow and Willsky, 1984; 
Gertler, 1998; Frank, 1990). Such vectors, computed by properly weighting 
the errors of process output estimates, which leads to a suitable exposition 
of the symptoms of faults, are principal premises for making diagnostic deci- 
sions (Chen and Patton, 1999; Chow and Willsky, 1984; Magni and Mouyon, 
1994). Practical generators of residual vectors can be based, for instance, on 
detection observers of states supplying appropriate estimates of the inter- 
nal states of the process. Certainly, apart from fulfilling their detective task, 
such observers should be both robust to the uncertainty of the model of the 
process under supervision and insensitive to the influence of unmeasurable 
disturbances and unmeasurable noise affecting the object and its measure- 
ment channels (Chen and Patton, 1999; Kowalczuk and Suchomski, 1998). 
In practice, these tasks should be fulfilled to a possibly high degree. The is- 
sue of the optimisation of residual signals generators can thus be accordingly 
expressed as a general problem of multi- objective optimisation (Chen and 
Patton, 1999; Kowalczuk and Bialaszewski, 2000; Kowalczuk and Suchomski, 
1999, Kowalczuk et al, 1999). 
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It is clear that the desired functional properties of state observers can 
be obtained by designing a suitable eigenstructure^ of the observation system 
(Chen and Patton, 1999; Kowalczuk and Suchomski, 1998). In the simplest 
version of the eigenstructure assignment technique, when only the eigenvalues 
of the observation system are optimised, the eigenvectors of the system are 
a ‘side’ effect of the procedure applied to pole placement by means of linear 
feedback (Chen et aZ., 1996). 

In a kind of introduction to analytical diagnosis methods founded on 
state observers (Kowalczuk and Suchomski, 1998), we described a procedure 
for determining a state-transition matrix of a robust observer that is based 
on an algebraic method of shaping the ‘full’ eigenstructure of this matrix, in- 
cluding both the system eigenvalues and eigenvectors (Prank, 1990; Liu and 
Patton, 1998). We focused there on certain basic issues concerning, in partic- 
ular, the formulation of the necessary and sufficient conditions for de-coupling 
the residual vector from disturbances. In (Kowalczuk and Suchomski, 1998) 
we also gave an algorithmic sketch of the synthesis of the generators of de- 
coupled residual vectors. In particular steps of this algorithm the following 
design objectives are determined: a weighting matrix of the residue generator, 
a detailed eigenstructure of the observer-state matrix, and an observer gain 
matrix, which secures the resolved eigenstructure for the state matrix. 

In a complete design of a robust observer that takes into account diverse 
aspects of the synthesis of the generators of decoupled residual vectors we 
should resolve the following issues (Chen and Patton, 1999; Kowalczuk and 
Suchomski, 1999; Patton and Chen, 1991a; 1991b): 

• the functional representation of an attainable sub-space^ of eigenvec- 
tors of the observer-state transition matrix of the assumed eigenvalues {spec- 
trum^), 

• the estimation of the sensitivity of the observer-synthesis procedure to 
the uncertainty of the applied object modelling, 

• the synthesis of the observer, characterised by the state-transition ma- 
trix of a given eigenstructure and minimal sensitivity to object modelling 
errors, 

• the determination of the observer gain matrix, assuring the resolved 
eigenstructure of the observer state matrix with a possibly high numerical 
precision, 

• the shaping of the observer gain matrix in cases when the spectra of 
the state matrices of both the supervised dynamic object and the designed 
observer are not disjoint in terms of set theory (which can, for instance, always 
happen while performing a free generation of eigenvalues of the observer state 

^ That is, the eigenvalues and eigenvectors of the system’s state-transition matrix (see 

also Chapter 5, and, in particular. Subsections 5.4.2 and 5.4.3, as well as Footnote 3 

therein). 

^ Also see Subsections 5.4.3, as well as 7.3 and 7.4. 

^ See the references in the previous footnotes. 
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matrix and while searching for optimal values in terms of certain external 
performance criteria), 

• the computation of the observer gain matrix in the case of multiple 
eigenvalues of the observer state-transition matrix. 

In this chapter we shall consider the above-mentioned issue of repre- 
sentation, including the problem of the parameterisation of the attainable 
sub-space of the eigenvectors of the observer state matrix of a given eigen- 
structure, corresponding to the assigned set of real eigenvalues. Such pa- 
rameterisation, which will be based on linear schemes, makes a convenient 
foundation for the further optimisation of the observer with respect to spe- 
cific criteria imposed by its application to diagnostic systems. This concerns, 
in particular, the question of increasing the robustness of such systems to 
errors in the modelling of the diagnosed dynamic objects. 



6.2. System modelling 

Let the discrete-time object (plant) under supervision be described as 

x{k -f 1) == Ax{k) -h Buu{k) + Bddd{k) -h Ef{k), (6.1) 

y{k) = Cx{k) -h Duu{k) -f Dndn(k) -h Ff{k), (6.2) 

where x{k) G denotes a state, u{k) G is a controlling signal, 

y{k) G E"^ is an output, dd{k) G E^'^ stands for an unmeasurable object 
disturbance, dn{k) G E^"^ represents measurement noise, while f{k) G E^ 
describes a vector of faults. All the matrices of (6.1) and (6.2) are prop- 
erly dimensioned: A G C G , Bu G , Du G , 

Bd G Dn G E G R^^\ and F G R^^\ Moreover, the 

following standard assumptions are made: {A,C) is completely observable 
(see Appendix A for a basic introduction), Bu and Bd are of a full col- 
umn rank (rank(^^i) = Uu, va.nk{Bd) = rid)-, C has a full row rank 
(rank(C) = m < n), and = m. Because the number of independent 
disturbances that can be decoupled cannot be larger than the number of 
independent measurement signals, we also assume that rid < m. Moreover, 
we suppose that the fault signal is modelled as an unknown time function 
f{k), while its infiuence on the state evolution x{k) and the system output 
y{k) can be described by the matrices E and F, respectively (see Chen 
and Patton, 1999; Chen et al, 1996; Kowalczuk and Suchomski, 1998). In a 
particular study, for instance, in order to consider faults in a control channel 
(such as failures connected with an actuator device), we can assume the fol- 
lowing settings: E = Bu and F = Du with v — riu> On the other hand, a 
fault of a sensor can be modelled as E — Onxm, F = Im, and v — m, where 
Im G R^^'^ denotes an m x m identity matrix. 
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6.3. Preliminary synthesis of residuals 

A full-order observer can be presented in the following form (Chen and Pat- 
ton, 1999; Kowalczuk and Suchomski, 1998; Patton and Chen, 1991a): 

x(k + 1 ) == Aox{k) + {Bu - KoDu)u{k) + Koy{k), 

where x{k) G is a state estimate, Kq G is an observer gain, and 

Ao = A — KqC G denotes the observer state transition matrix. It is 

assumed that Aq is a stable matrix, i.e., all its eigenvalues, meaning members 
of the set \{Ao) called the spectrum of the matrix Aq, belong to the open 
unit circle v{z) = { 2 : G C : | 2 :| < 1} of the complex plane (Brogan, 1991 
Jury, 1964). The evolution of the state estimation error 

Xe{k) = x{k) — x{k) G R”' 



can be described by 

Xe{k "i" 1) — AQX^{k^ -|- B(id(i(^k^ KoDndfil^k^ 

+ {E-KoF)f{k). (6.3) 

An estimate y{k) of the plant output takes the form 
y{k) = Cx{k) -f- Duu{k). 

A weighted residue yw{k) G acting as a principal indicator in fault de- 
tection results from 

yw{k) = Wye{k), 

where ye{k) = y{k) — y{k) G denotes an original output error and W G 
^wxm ^ designed weighting matrix of a full row rank, rank (VF) = w <m. 
It is thus assumed that the residual dimension also cannot be larger than the 
number of independent measurements (Chow and Wilsky, 1984; Gertler and 
Singer, 1990; Liu and Patton 1998; Magni and Mouyon, 1994). In view of the 
above, the weighted residue yw{k) can be interpreted as the following afSne 
function of the current state error: 

y^(k) = WCXeik) + WDndnik) -k WFf{k). 

The 2 :-transform solution x^{z) to (6.3) has the form 

x,{z) = To{z)(F - KoF) liz) +To(z)Bddaiz) 

- To{z)KoDnd^{z) + zTo{z) Xe(0), 

where f{z), ddi^) d^{z) are the respective 2 ;-transforms, o:e(0) denotes 
an initial state estimation error, while 

To(z) = {Zin - Aoy\ 



(6.4) 
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Consequently, we have the following 2 :-transform of the weighted residue: 
~ + C!rd{^) ddi^) "h ^rn{^) “I" 

where 

and the matrix transfer-functions 



Grf(z) = WF + HTo{z){E - KoF), 


(6.5) 


Grd{z) = HTo{z)Bd, 


(6.6) 


Grn{z) = WDr, ~ HTo{z)KoDn, 


(6.7) 


Grx{z) = zHTq{z) 





describe the influence of faults, plant disturbances, measurement noise, and 
the initial state estimate (uncertainty), respectively, on the residue. 



6.4. Conditions for disturbance decoupling 

In the following we assume that the observer state-transition matrix Aq of 
a freely assigned spectrum of real eigenvalues X{Ao) = is diagonal- 

isable. It follows that only the matrices of non-defective eigenstructures are 
admissible, which means that any eigenvalue of Aq should be distinct or 
multiple with the property that its algebraic multiplicity equals geometric 
multiplicity (Chatelin, 1993; Golub and Van Loan, 1996). Therefore, we as- 
sume that in our effective eigenstructure assignment the number of linearly 
independent eigenvectors associated with a given eigenvalue Xi of Ao cannot 
exceed the algebraic multiplicity of this eigenvalue. 

It is worth noticing that the set of n x n diagonalisable matrices is dense 
in the space^ ^nxn ^ which means that even a ‘small’ change in a defective 
matrix can remodel its Jordan form into a diagonalisable one (Golub and 
Van Loan, 1996; Stewart, 2001). Therefore, in many pragmatic numerical 
solutions the designer consciously tends (as much as possible) to a fabri- 
cation of the synthesis problem under consideration that permits feasible 
solutions belonging to the set of the diagonalisable matrices. Consequently, 
with reference to the above-given general remark, we can expect that the 
assumption about the diagonalisability of the observer state matrix applied 
to our problem of FDI-observer synthesis shall also facilitate all derivations 
of the required conditions under which the residue vector is decoupled from 
disturbances. 

We cannot, however, reduce the area of feasible solutions to the subset 
of diagonalisable matrices with solely distinct eigenvalues. For some practical 

^ Defined over the field C of complex numbers. 
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reasons, following mainly from the obligation for synthesising an important 
class of dead-beat (time-optimal) observers and from the need to simplify 
the design procedures, it is essential to consider observers with diagonalisable 
matrices of multiple eigenvalues (Kowalczuk and Suchomski, 1999). 

On the other hand, the assumption about exclusively real spectra of the 
observer state matrix, which significantly facilitates our presentation, has no 
severe consequences for the designer. And it is common practice that the 
design specifications for the observer speed are given in terms of its time- 
constants or equivalently in terms of the real eigenvalues of its state matrix 
(Ogata, 1995). 

For a diagonalisable matrix Ao the following similarity relation holds 
(Chatelin, 1993; Golub and Van Loan, 1996; Meyer, 2000): 

Ao = RnDxR~\ 

where D\ is a diagonal matrix containing all eigenvalues Aj of Aq'. 



Ai 



Dx = diag{A,}^^, = 



and Rn = [ ri ••• Vn ] ^ denotes the right modal matrix of Ao 

composed of its right eigenvectors ri G ordered suitably to A^: AoTi = 
XiVi, Vi G {!,..., n}. Furthermore, let = [ Zi ••• In ] ^ be 

a left modal matrix of Aq, which means that the columns k G R^ of Ln 
are left eigenvectors Aq: if Aq — Klf^ Vi G {!,..., n}. The eigenvector 
sets {ri}f^i and {h}f=i are mutually almost orthogonal: rflj = 0, i ^ 
Vi,j G {!,..., n}. Hence, for properly scaled modal matrices, we have 
R~^ = Lf, and so Ao = RnD\Lf. Consequently, Tq{z) corresponding to a 
diagonalisable Ao has the following dyadic (spectral) representation: 

= ( 6 . 8 ) 

i=l ^ 



6.4.1. Necessary condition for decoupling 

The following lemma presents the basic necessary condition for the residue 
yw{k) to be completely decoupled from the disturbance dd{k). This require- 
ment is equivalent to the identity Grd{z) = Owxud ? which - according to (6.6) 
and (6.8) - can be fulfilled if and only if (Patton and Chen, 1991a; Kowalczuk 
and Suchomski, 1998): 



HVili Bd — Owxrid'i 



Vi e {l,...,n}. 



(6.9) 
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Lemma 6.1. (Necessary condition for disturbance decoupling) If Grd{^) = 
OwxudJ '^hich denotes the complete decoupling of the residual vector from 
disturbances, then it holds that 

HBd-Owxud- ( 6 . 10 ) 

The latter can be interpreted as a design constraint imposed on the weighting 
matrix W (Kowalczuk and Suchomski, 1998; Liu and Patton, 1998; Patton 
and Chen, 1991a). 

Proof. The proof is straightforward. According to (6.9) and RnL^ = In, we 
have 

n 

- HRnL^Bd = HBd = O^xna, 

i=l 

which yields our claim. ■ 

Rewriting the condition (6.10) as (CBd)^W^ = Oudxw produces the 
following convenient rule for the choice of the number w of the coordinates 
of the residue yw{k) (Suchomski and Kowalczuk, 1999): 

w = dim [Ker((C^d)^)] = m — r, 

where r == rank((7J5d). Such a choice appears to be rational. We simply 
believe that taking the residue of a possibly large dimension improves certain 
detectability competences of the detector, while avoiding at the same time 
unnecessary redundancy is also of practical merit. Since CBd G ^ 

taking Ud < m yields r < m, which gives the required > 0. In order to 
ensure the identity {CBd)'^W'^ = Ondxw, we apply linear combinations of 
vectors from Ker{CBd)^ to generate the columns of . 

A convenient orthonormal basis of this null sub-space can be obtained 
by taking the columns of a sub-matrix Ud G of a full column rank 

rank(i7rf) = w resulting from the following singular value decomposition^ 
{svd) of CRrf: 



CBd = I ILd Ud 



S.d ^rx(nd-r) 
Cwxr Gyjy^^nd — r) 






d 5 



where [ Ud ] ^ £mxm ^ j^ridxud orthonormal matrix fac- 
tors, G is a non-singular diagonal sub-matrix containing non-zero 

singular values of CBd, while U_^ G . 

Consequently, a useful rule for the parameterisation of the matrix W 
can be stated as follows: 

W = w^uj, (6.11) 

where Ww G of rank(Wu;) = w acts as a non-singular matrix param- 

eter subject to a free design (Suchomski and Kowalczuk, 1999). In the sequel, 
it is assumed that w >1. 

See Forsythe et al. (1977) or Brogan (1991), for instance. 



5 
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6.4.2. Sufficient conditions for decoupling 

On the basis of the Taylor expansion we get the following infinite series 
representation of (6.4): 

Tq{z) = [In + z~^Ao + H ) . 

Now, by taking into account (6.6) and the Cayley-Hamilton theorem (Bar- 
nett, 1971; Brogan, 1991), we can easily derive two separate sufficient con- 
ditions for Grd[z) = Owxud^ which means that the required nullification of 
this transfer matrix is ensured by satisfying one of them. 

Consider the following invariant sub-spaces: 

• (Ao\Bd) - invariant sub-space: 



n-l 

i=0 

• [A'^\H'^) - invariant sub-space: 

i=0 

Sufficient conditions for disturbance decoupling take one of the following 
forms (Liu and Patton, 1998; Patton and Chen, 1991a): 

(cl) {Ao\Bd) C Ker(H), 

meaning that the matrix products A^Bd, Vi > 0, have to belong to the right 
null sub-space of H; or 

(c2) {A^\H^) C Ker(Bj), 

yielding the requirement that HA^ must belong to the left null sub-space of 
Bd, Vi > 0. 

The above conditions give general guidelines for designing disturbance 
decoupling generators with no requirement for Aq to be diagonalisable. Nev- 
ertheless, it is not easy to achieve those without some further assistance of 
modern design methodologies, such as eigenstructure assignment (Liu and 
Patton, 1998). Yet it turns out that a more convenient and practical suffi- 
cient condition for disturbance decoupling can be derived for diagonalisable 
state matrices Aq. The lemma given below is a simple and ‘natural’ extension 
of the original version by Liu and Patton (1998), where only the matrices Aq 
of distinct eigenvalues are considered. 
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Lemma 6.2. (Sufficient condition for disturbance decoupling) Complete 
disturbance- decoupling Grd{^) = Owxna achieved if the following double 
condition is satisfied: 

HBd = Owxnd: (^-12) 

H = Ll, (6.13) 

where the columns of = [ h • • • ] G are left eigenvectors 

associated with real eigenvalues of a diagonalisable Aq. 

Proof. Zeroing HriljBd^ Vi G {1, . • . follows from the equality ijBd = 
0^^, which is an immediate consequence of the observation L^Bd = HBd — 
Owxrid' Oil other hand, for the right eigenvectors of Aq 

associated with the remaining eigenvalues of this matrix we 

have Hri = L'f^ri = Ow, which establishes the zero value of HriljBd, 
Vi G {u; H- 1, . . . ,n}. ■ 

As has been mentioned above, the necessary condition ( 6 . 12 ) can always 
be satisfied via a proper parameterisation ( 6 . 11 ) of the weighting matrix 
W. Note, however, that from the above proof it follows that a mismatch 
between any column of i.e. any left eigenvector l{ for i G {!,..., u;}, 
and a corresponding column of the matrix {W C)'^ implies that, in general, 
all the subsequent products HriljBd, i G {rc + 1, . . . , n}, can be non-zero. 
A deeper discussion of the sources of this phenomenon will be supplied in 
the next section, where the concept of the attainability of eigensubspaces is 
explored. 



6.5. Parameterisation of attainable eigensubspaces 

Let he a> set of real numbers being the assigned eigenvalues of the 

observer state matrix Aq = A — KqC. The assumed observability of the pair 
(A,C) implies (Andry et a/., 1983; Kautsky et al, 1985; Sobel et a/., 1996) 
that for each there is a gain Kq G ensuring the following 

det(A - KqC - Xiln) == 0, V i G {1, . . . , n}. 

A vector li G is said to belong to an attainable left eigensubspace of 
the pair (A, C) associated with a given eigenvalue A^, 2 G {1, . . . , n}, of the 
observer state matrix Aq if and only if there exists a matrix Kq G 
for which if {A — KqC) = Xdf. 

Hence, k ^ On G MP , i G {1, . . . , n}, is an attainable left eigenvector 
associated with A^ G X{Ao) if and only if the equality 

{Xiln - A^)k - C^Vi (6.14) 

holds for a certain vector parameter vi G satisfying 



(6.15) 
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with a Ko G . Consequently, the corresponding attainable left eigen- 

subspace Li{A,C) C can be defined as 

Li{A, C) = {/ e M" : {Xiln - A'^)l = C'^v, {) € E™ } . 

It can be easily verified that Li{A,C) is a ‘true’ sub-space in E^. Let 
/i, I 2 E Lj ( A, C), which means that 3ui, V 2 G E"^ such that — A^)/i == 

C^vi and {Xiln — A^)l 2 = C^V 2 . We immediately observe that {Xiln ~ 
^^){h P-I 2 ) — C'^{v\ +U 2 ) ^ /i +/2 ^ Li{A^ C). Clearly, the vector G E^, 
which cannot be considered as an eigenvector, ‘formally’ belongs to Li{A^ C). 

It should be emphasised that the above parameterisation (6.14) of the 
attainable left eigensubspace is valid only for left distinct eigenvectors of the 
observer state matrix Aq and cannot be applied in the case of left gener- 
alised eigenvectors of this matrix. By the presumed diagonalisability of Aq^ 
the presented approach concerns solely the matrices with distinct eigenvalues 
and those of multiple eigenvalues whose algebraic multiplicity equals their ge- 
ometric multiplicity (only one-dimensional Jordan blocks are admissible). An 
effective parameterisation of eigensubspaces for non-diagonalisable matrices 
goes beyond the scope of this chapter and will not be considered here. 

Let li ^ On ^ Li{A,C). As follows from (6.15), an individual observer- 
gain matrix can be derived from the following relationship: 

Ko - (6.16) 

where lf‘ 6 denotes any left generalised inverse of Note that we 

can employ a uniquely defined vector called the left pseudo-inverse = 
{lfh)~^lf (Boullion and Odell, 1971; Rao and Mitra, 1971; Weinmann, 1991) 
for this purpose. 

The formula (6.14) shows that a convenient method of parametrising 
Li{A^C) can be obtained by considering the null sub-space Kev{Ti) of the 
following matrix: 



Ti^ 



XJn-A'^ _CT j g £nx(n+m) 



(6.17) 



The complete observability of the pair (A, C) implies that the matrix 
Ti has a full row rank: rank(Tj) = n, VA^ G E (see Appendix A). It follows 
that dim(Ker(Ti)) = m and, equivalently, dim(Li(A, C)) = m. Hence we 
have m degrees of freedom in representing the vector [if vf\^E 
in a given basis of the sub-space Ker(Ti), Vi G {l,...,n}. On account 
of the above we establish the following plain lemma concerning the maxi- 
mal admissible multiplicity of any assigned eigenvalue of the observer state 
matrix Aq. 



Lemma 6.3. (Maximal multiplicity of the eigenvalue of the observer state 
matrix) The multiplicity of any eigenvalue of a diagonalisable Aq cannot 
exceed m. 
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In the sequel, we will discuss a convenient method of parametrising the 
sub-space Ker(Ti) for a given eigenvalue of a diagonalisable Aq. Two 
characteristic cases will be examined. The first (separation) case concerns 
the situation of A(Aq) fl A (A) = 0, in which the spectra of the state matrices 
Ao and A have no common elements. In the second (mutuality) case we 
assume that A(Ao) D ^(A) ^ 0 , i.e., that there exists at least one common 
eigenvalue of the matrices Aq and A. 

6.5.1. Separate spectra of the observer and the object 

Let us assume that for each given eigenvalue A^, i G {1, . . . ,n}, of the ob- 
server state-transition matrix Aq we have ^ A(^), which is equivalent to 
the inequality det{XiIn — A'^) 7 ^ 0. Hence, by virtue of the above relation- 
ships, we acquire 



Ker(Ti) = 



{Xiln - 




Thus, taking any parameter G leads to a uniquely determined vector 
li G given by 

h = SiVi, (6.18) 

where Si = {XJn - A'^)~^C'^ G . 

Consequently, the column sub-space Im (Si) of Si can be (provision- 
ally) identified as an attainable left eigensubspace of the pair (A,C) as- 
sociated with a given eigenvalue A^: Li{A,C) = Im(5i), i G {l,...,n}, 
with dim(Li(^, (7)) = rank(5i) == m. Any non-zero vector vi G 
can serve as a parameter for the corresponding k G Im(5i), and taking 
Vi ^ Om ensures li ^ On- On the other hand, according to (6.18), any 
li G Im (Si) can be parameterised by a uniquely determined vector-parameter 
u, = Sfli = {SlSi)-^Slh. 

6.5.2. Mutuality in the spectra of the observer and the object 

Let the following hold: Xi G A(A) for a certain i G {1, . . . ,n}, which means 
that det {Xiln — A^) = 0. Hence it is clear that the previously given param- 
eterisation (6.18) is not suitable for this case. Moreover, it is worth noticing 
that Xi can also appear to be a multiple eigenvalue of A. An orthonormal 
basis of the null sub-space Ker(Ti) of the matrix Ti defined by (6.17) can 
be derived via performing the svd of this matrix. Since rank(Tj) = n, we 
obtain Ti = Fi]^,_where Ui G and [V_iVi\ G 

with G and Vi G are orthonormal matrix factors, 

while Tii = [E^ Onxm] ^ has a block structure with a non-singular 

diagonal sub-matrix E^ = dmg{aij}^^^ G which contains the singu- 

lar values ai^i > • • • > ai^n > 0 of Ti. It thus follows that Ti = Ui'Zj^Vj, 
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and the columns of the sub-matrix V{ represent the required orthonormal 
basis for the analysed null sub-space: Ker(Ti) = Im(yi) = {Viv : v G W^}. 
With this basis we conclude that [ if vf G Ker(Ti) has the following 
representation: 

f 1 

= ViVi, 

where the vector Vi G makes a useful design parameter. This represen- 
tation is characterised by the following two practical lemmas. 

Lemma 6.4. (Parameterisation of an attainable eigensubspace: I) Taking 
any non-zero parameter vi G results in a corresponding (non- zero) left 
eigenvector li. 

Proof, (i) On the assumption that the matrix C is of a full row rank, the 
necessary condition for zeroing li = On takes thus the form of the equality 
z= Om- It therefore follows that a non-zero parameter Vi Om gives a 
corresponding non-zero li. 

(ii) Since Vi has a full column rank, we observe that for a non-zero 
Vi 7 ^ Om it holds that [ifvf]^ ^ ^n+m- The case k = On should be excluded, 
since zeroing k in the vector [ifvf]'^ would imply a non-zero Vi, which 
contradicts the earlier observation. ■ 



Lemma 6.5. (Parameterisation of attainable eigensubspace: II) Considering 
the following partition of the basis: 

Vi^i G V2^i G 



Via 






V2,i 



we observe that the upper sub-matrix Vi^i has a full column rank, 
rank(Fi,i) = m, while the lower sub-matrix t^ 2 ,i is singular. 

Proof, (i) The full-column rankness of the upper sub-matrix follows immedi- 
ately from Lemma 6.4. 

(ii) On the other hand, we have 3/^ ^ On G : {Xiln — = ^n- 

Clearly, any left eigenvector of A satisfies this equality. It thus follows that 
\fi G Ker(Ti). Consequently, 3vi ^ Om ^ for which 



' h 




1 

1 


1 

1 




1 

1 



By virtue of the above, we conclude that the square sub-matrix 1 ^ 2,2 should 
be singular. ■ 

It thus follows that the column sub-space Im (t^i,i) can be provisionally 
recognised as an attainable left eigensubspace of the pair (A, C) associated 
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with a given eigenvalue A^: Li{A^ C) = Im i E {1, . . . , n). It also holds 

that dim{Li{A,C)) = rank(yi^i) = m. Any vector Vi G can serve as a 
parameter for the corresponding li G lm{Vi^i): 

h = Vi.iVi, (6.19) 

and taking a non-zero Vi 7 ^ Om ensures k 7 ^ On- On the other hand, any 
li G lm{Vi^i) can be represented by the uniquely determined parameters 
Vi G and Vi e Vi = V^ih, Vi = ik, where = 

A particular (individually tuned) observer gain Kq corresponding to li 
can be computed by virtue of (6.16). 

Why have the above parameterisations of Li{A,C) been called provi- 
sional? Why are they not good enough? A simple observation explains our 
lexical modesty. By taking 7^ Xj, i^j G n}, as suitable eigen- 

vectors from Li{A,C) and Lj{A,C), we should accept only those vectors 
which are linearly independent. For example, careful parameterisation is re- 
quired if Im(C^) is a {Xiln — A^){XjIn — A^)“^ -invariant sub-space. A 
less sophisticated situation takes place if Li{A^ C) corresponds to a multiple 
eigenvalue: the linear independence of suitably parameterised elements of the 
space Li (A, C) is the necessary condition for accepting them as the required 
attainable left eigenvectors associated with this eigenvalue. 

An algorithm for the optimal parameterisation of the relevant attainable 
eigensubspaces is presented below. The proposed approach to the optimal 
parameterisation of the space Li(A, C) consists in searching for a set of at- 
tainable left eigenvectors k G Li{A,C), Vi G {l,...n}, which are mutually 
linearly independent and fulfil additional requirements, concerning distur- 
bance decoupling and the numerical sensitivity of the resulting observer. 

6.5.3. Partial observer gain 

Let k < n, denote the set of linearly independent attainable left 

eigenvectors k G orderly associated with the set of eigenvalues {AilfL^ 
of the observer state matrix Aq- Thus there exists a set {vi]\^i containing 
the vectors Vi G E"^ for which (6.15) is valid, Vi < k. This implies that a 
‘common’ gain Kq G E^^"^ can be derived from the following set of linear 
equations: {ifKo = —'i^f}i=i - The solution takes the form 

Ko = -{VkL*‘f, (6.20) 

where Lk = [h- - h] G E^^^ and Vk = [vi-- Vk] G E"^^^ can be regarded 
as the design parameters, while Lf^ G E^^^ denotes a left generalised inverse 
of Lk- Similarly to the previously shown development, the left pseudo-inverse 
L+ = {LlLk)-^Ll G can be applied. 

It is worth noticing that in the case of fc < n, the remaining n — k 
eigenvalues of Aq excluded from the purposeful design take the resulting 
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values. The full column rank of Lk is the necessary and sufficient condition 
for the existence of Since the set {h}i=,i is linearly independent, we 
always have rank(Lfc) — k. 



6.6. Synthesis of a numerically robust state observer 



Let Ao 4- SAo denote a perturbed state matrix of an observer, where SAq G 
£nxn denotes a small perturbation (deviation) from the nominal state matrix 
Aq. The matrix Aq + SAq yields the perturbed eigenvalues -j-JAi G E and 
the perturbed right eigenvectors 4- G , z G {1, . . . , n}. While seeking 
the potential sources of 5Ao we can list uncertainties in plant modelling 
(A, C) or inaccuracies in the numerical implementation of the observer gain 
Ko (see Burrows et al, 1989; Fletcher, 1988; Kautsky et al, 1985; Sobel et 
a/., 1996). Employing a first-order approximation for the generic condition 

{Ao + 5Ao){ri-\- 5ri) = (A^ -h 5Ai)(ri 4- 5n), V z G {1, . . . ,n}, 
yields the following formula: 

AqSti 4“ SAqVi — \iSt i 4" SXiTi. 



Therefore, for any left eigenvector li the following incremental equation 
holds: 

SXilj^ Vi — SAqVij 

from which we obtain the following estimate of the size of the eigenvalue 
perturbation \5Xi\ formulated in terms of consistent vector norms (Demmel, 
1997; Kato, 1995; Wilkinson, 1965): 



15A,| < 



ifn 



\n\ 



+ 



o{\\SAof), 



Vi e 



The geometrical quantity 



sec(Z(Z„rO)| = li^^, 



with ^{li,Vi) denoting the angle between the eigenvectors U and r^, can thus 
serve as a convenient relative measure of the sensitivity of A^, Vz G {1, . . . , n}, 
to the unstructured perturbations of the observer state matrix. Note that if 
Ao is orthogonal, we observe that |sec(Z(Zi,ri))| = 1, Vz G {!,..., rz}. The 
quantity max^ |sec(Z(/ 2 ,ri))| can thus be recognised as a suitable measure 
for the spectral sensitivity of Aq to the unstructured perturbation of its 
entries (Liu and Patton, 1998; Wilkinson, 1965). It is also well known that 
the condition number (Demmel, 1997; Meyer, 2000) of a left modal® matrix 



® The columns of Ln are simply the left eigenvectors of Aq. 
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Ln G associated with Aq defined as /^(I/n) = W^nW \\L~^\\ can be used 

as a convenient majorising quantity of max^ | sec(Z(/i, r^))!, since it holds 
(Demmel, 1983; Wilkinson, 1965) that max^ |sec(Z(Zi,ri))| < Ac(Ln). 

By virtue of the above we infer that the infiuence of the deviations SAq 
can be diminished by shaping the left modal matrix Ln so as to get a suf- 
ficiently small value of K.{Ln)- On the other hand, the inverse of this index 
can be interpreted as a measure of the distance between Ln and the ‘near- 
est’ singular matrix (Demmel, 1997; Golub and Van Loan, 1996; Weinmann, 
1991). Since Ln is also a matrix of linear equations, which are used to solve 
the observer gain Kq (6.16) and (6.20), we deduce that by diminishing 
K,{Ln) we also improve the numerical properties of the procedure of comput- 
ing Ko- Inasmuch as the condition number takes its minimum K{Ln) — 1 
for an orthogonal Ln (Demmel, 1997; Golub and Van Loan, 1996), a suit- 
able synthesis of Kq for a given set of the established eigenvalues 
should include a mechanism of the orthonormalisation of the corresponding 
attainable left eigenvectors {h}2=i‘ 

The proposed parameterisation of the attainable left eigenspaces is affine 
(linear) with respect to the freely assigned parameters. Thus the correspond- 
ing procedure of the numerically robust synthesis of a state observer for an 
assumed set of eigenvalues is principally not a difficult task because in such 
cases all eigenvectors are chosen with respect to a soft numerical requirement 
of orthonormality. The problem becomes quite complex when we consider 
the issue of eigenstructure assignment in the case of detection observers. 
Then not only should the above-mentioned soft requirement for the attain- 
able left eigenvectors k for i G {^^; + 1, . . . ,n} be satisfied but, principally, 
the hard constraints resulting from the disturbance decoupling principle are 
to be fulfilled at the first stage of shaping the attainable left eigenvectors k 
for i E {1, ... ,w}. This specific problem of a greater difficulty will be solved 
in the next subsection with the use of all orthogonalisation tools developed 
till now. 

On account of the above remark, let us first consider the problem of the 
optimal parameterisation of the attainable left eigensubspaces corresponding 
to a given set of eigenvalues of a diagonalisable observer state matrix. The 
applied optimality of design will be recognised in terms of the soft require- 
ment, thus a gap between the resultant left modal matrix and an orthonormal 
matrix should be minimised. 

An angular distance between a given vector ^ i = w 1, . . . 
and the orthogonal complement of the column sub-space Im(Li-i) of the 
matrix Li-i = [ h • • • k-i ] G can be determined as 



cos (Z(Zi,Im (L^_i)-^)) 



WhW 
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where Pim(Li-i)-^ ^ denotes the matrix of orthogonal projection on 

the sub-space Im By definition, the matrix Lj_i is of a full column 

rank, hence dim(Im (Li-i)^) = n — {i — 1). 

While considering the separation case ^ A(A) for i = ic + 1, . . . , n, 
we seek an optimal parameter Vi G , which minimises the proposed angu- 
lar measure Z(/i, Im (L^-i)-^) under the standard normalisation constraint 
1 1 /ill = 1. In the case of mutuality, when Ai G A (A) is the subject of our 
interest, we employ the effective parameterisation of the form k — Vi^iVi, 
in which Vi G stands for a free parameter. Below, we shall start our 
deliberations with the case of A^ ^ A(A), while the more intricate case of 
Ai G A(A) will be discussed later on. 

The computational algorithms we offer are principally based on the 
guidelines given in Appendix B. It is worth noticing that the proposed method 
of parametrising the attainable left eigensubspace radically facilitates the dis- 
cussed FDI design procedures because searching for the design parameters 
Vi or Vi can be easily expressed in terms of a standard optimisation task 
with linear constraints. 

6.6.1. Separate spectra of the observer and the object 

In this case (Aj ^ A(A)) we have k = SiVi for the free-design parameters 
Vi, i = w 1, . . . ,n. Employing the method of optimal tuning given in Ap- 
pendix B yields the required solution in the following form: 

Vi = and li = UiVMi,i, (6.21) 



where 

• the matrices , Vi ^ j^mxm ^ j^mxm provided 

by the svd of Si'. 

Si = Ui^iV^ 

with two orthonormal matrix factors Ui = [ IJ_^ [7^ ] G and Vi, 

Ui G ^ while is a non-singular diagonal sub-matrix of the factor 

FeR""'"; 

• the vector parameter VMi,i ^ denotes the first right singular vector 
of an auxiliary matrix: 

Mi = UL,_,Ul_,UieWP^"^ 

associated with the largest (first) singular value of this matrix, and 



Li-l — 



Uu. 



Ul,-. 






vL^ 



represents the svd of ij-i £ containing a diagonal non-singular 

sub-matrix 6 and two orthonormal matrix factors: the 
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first matrix, [ U_Li_i ^Li-i ] ^ with properly dimensioned sub- 
matrices ULi^i ^ and Uii-i ^ , and the second one, 

Vl,_, 

In order to establish the above solution it sufBces to show that Im (Si) = 
and Im(Li_i)^ = Im(C/L._J. The solution k of (6.21) does always 
exist. We should, however, emphasise (see Appendix B) that this solution 
can only be accepted if Im(5i) Im(Li_i). Let ^ {Aa;}^‘ 1\ for i G 
{u; -f- 1, . . . , n}, then the admissible multiplicity of this eigenvalue is equal to 
the difference of dimensions of the applicable sub-spaces: 

dim (Im (Si)) - dim (im (Si) fl Im {Li-i)) = rank ^ Si L^_i ] ^ - (z » 1). 

This difference belongs to the closed interval [0,m]. The inconvenient 
zero value, corresponding to Im(5i) C Im(Li_i), should, however, be pro- 
vided only for a sufficiently large 2 > m + 1. The highest admissible multiplic- 
ity m refers to the zero intersection Im (5^) Dim (Li_i) = {On}- A practical 
test for the applicability of a given solution li can be founded on checking for 
a non-zero projection of k onto the orthogonal complement of the sub-space 
Im(Li-i): ULi-iUL-_Ji ^ On- Clearly, the standard problem of specifying a 
‘numerically robust’ zero met here should be solved in a rational manner. 



6.6.2. Mutuality in the spectra of the observer and the object 

In the previous subsection we have considered the problem of the optimal 
parameterisation of a given linear injective map, which permits a convenient 
generation of the attainable left eigenvectors of the observer state matrix 
for a given parameter. The method described above, after certain obvious 
modifications, can also be used in the other distinguished case of spectral 
mutuality, seeing that for any eigenvalue G A(Ao), orthonormalising op- 
timisation should concern the vector k = Vi^iVi, parameterised by Vi G 
(see (6.19)). This clearly yields the following firmulae: 









where 

• the matrices . . € K'"'"'" and ^y^, e are 

obtained via the svd of the upper sub-matrix Vi^i of the matrix Vi associated 
with Ti'. 






Un, 









0(n- 



{n—m)xm J 






while V 2 ,i C denotes the lower sub-matrix of the previously examined 

partition of Vi (see Lemma 6.5); 
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• the vector v^. i G denotes the first right singular vector of the 
following auxiliary matrix: 

Mi - Uu_,Ul^,Uvr,, e 

associated with the largest (first) singular value of this matrix. 

Now we employ the equality Im(yi,i) = Im(C/y^ J. Similarly to the 
development shown in the previous subsection, any current solution pro- 
vided by (6.21) requires verification and should be rejected if the inclusion 
lm{Vi^i) C Im(Li_i) is detected. An effective test for checking the validity 
of h takes the usual form of ^ ^n- Analogously, the admissi- 

ble multiplicity of a given eigenvalue being subject to the design is equal to 
rank([ Vi,* ])-(*- !)• 

It is worth noticing here that using the method described in this subsec- 
tion should always be recommended in the ‘numerical contexts’, in which we 
are not sure if the condition ^ \{A) is ‘strongly’ satisfied, i.e. when the 
matrix {Xiln — A^) is weakly conditioned and cannot be exactly inverted. 
Clearly, the method presented in Subsection 6.6.1 is computationally cheaper 
since the corresponding svd concerns smaller matrices Si G , whereas 

the method of Subsection 6.6.2 deals with Ti G . 



6.7. Synthesis of a numerically robust decoupled 
state observer 

Consider a matrix L^j = [ h • • • ] G having columns com- 
posed of the attainable left eigenvectors associated with a given 

set of eigenvalues of the observer state matrix. From (6.11) it 

follows that the optimal matrix Ww, assuring the minimum of the norm 
\\Ll - WC\\ = \\L^ - C^UdWn;\l can be described as W* = {C^Ud)-^L^. 
Hence the following index can serve as a convenient measure of disturbance 
decoupling: 

X(L^) = \\L^ - C^UdW:\\ = , 

where Pim{c'^Ud)-^ = In - C'^Ud{C'^Ud)~^ G denotes a matrix of 

production onto the orthogonal complement of the column space of the 
matrix C^Ud G As rank(C^i7d) = u;, the svd of this matrix 

takes the form C^Ud — where Uc = [lJ_c Uc] ^ and 

Vc G are orthonormal matrix factors, G , Uc G ^ 

while Sc = [ Owx{n-w) ^ E^^^ contains a non-singular sub-matrix 
G E^^^ with diagonally placed singular values of C^Ud- Consequently, 
the projection matrix can be represented by Pim{C^Ud)-^ ~ UcUj and the 
resulting disturbance decoupling index is expressed as x(I'w) = \\UcU^ Pw\\- 
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The above implies that the complete matching of the matrices W C with 
LJ, which is equivalent to zeroing the disturbance decoupling index xi^w)^ 
can be achieved if and only if k G Ker (Uj) = Im ^ {1? • • • ? ^}- 



6.7.1. Numerically robust attainable decoupling 

Let denote a set of fixed eigenvalues and let us focus our attention 

on the search for a set of associated attainable left eigenvectors By 

virtue of the above discussion, we know that the demand for assuring possi- 
bly maximal decoupling of the designed residue from the plant disturbances 
can be equivalently expressed as the requirement for minimising the angular 
measure between these eigenvectors and the sub-space Kei{Uj). 

By combining the above with the previously derived conditions for the 
numerical robustness of the resulting observer (see Section 6.6), we conclude 
that now we should minimise the angular distance between a unity-norm 
attainable left eigenvector k, ||/i|| = 1, i = and the sub-space 

Ker([/J)nlm(Li_i)^CE^. 

The great convenience of this rule follows from the fact that the required 
optimal solutions can be easily obtained by applying the simple techniques of 
Subsections 6.6.1 and 6.6.2. It is sufficient to determine an appropriate basis 
for the following sub-spaces: Kev{U^) if i = 1, and Ker(C7J) nIm(Li_i)-^ 
if i = 2, . . . The first case is obvious, while in the second case we can 
employ the following formula (Meyer, 2000; Petkov et al, 1991): 

Ker(C/J) nIm(Li_i)^ = (Ker(t/J)^ + Im (Li_i))^. 

Next, by taking into account the fact that Ker = Im {Uc), we acquire 

Ker(C7J) n Im (Li_i)-^ = (im (Uc) 4- Im (L^-i))*^ = Im(Li_i)^, 
where an extended matrix Li-i G is defined as follows: 



Li— I — 



Uc 



Uc Li— I 



for i = 1, 
for i = 2, . . . ,w. 



Hence an optimal li can be obtained via minimising the angular measure 
Z(Zi,Im (Li_i)-^). This task can be effectively resolved by applying the pre- 
viously developed methods of the parameterisation of attainable left eigen- 
subspaces. Thus the formula (6.21) is utilised for the case of Xi ^ X{A), 
while (6.22) is used if G A(A). The only modification concerns the 
matrix ULi-i € R^x(n-(z-i)) ^ which establishes the orthonormal basis for 
Im (Li-i)-^ and follows from the svd of Li-i. Now the matrix Uii-i should 
be replaced by a matrix Ui,_^ G M^x(n-ni_i)^ _ rank(Li_i), which 

constitutes the orthonormal basis for a sub-space lm{Li-i)-^ associated with 
the extended matrix L^_i, and can also be derived via the svd of L^_i. 
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As can be easily seen, the optimal vector k exists if and only if fii-i < n. 
Since for i < w we have n — w {i — 1) <n, it follows that this formal 
condition always holds. Things of interest are here the particular conditions 
of the applicability of such vectors, presented below. 

First of all, as a comment to the above, let us observe that if for 
a settled i G {!,..., u;} it holds that h G Im(Lj_i) or k G Im(C7c), 
then Hence a numerically robust detection inequality 

_i^J. ^ being a contradiction to the necessary condition of the 

above inclusions, seems to be a convenient base for the rational acceptance of 
li as the z-th optimal eigenvector of the observer state matrix. For such se- 
lection both of the inclusions k G Im {Li-i) and k G Im (Uc) are impossible: 
the former is rejected by virtue of the linear independence of the eigenvectors, 
while the latter is rejected by the disturbance decoupling obligation. 

The maximum admissible multiplicity of an eigenvalue is equal to 
rank([ Li-i Si ]) - rank(Li_i) if A, ^ A(^), or rank([ Li-i Vi^i ]) - 
rank(Lj-i) for A, € A(A). 

As a conclusion to this subsection, let us note that the proposed al- 
gorithm for the synthesis of the left modal matrix containing columns 
which are the attainable left eigenvectors {h}^=i of the observer state matrix 
for an assumed spectrum, is based on two principles. The resulting residue 
should be both maximally decoupled from the plant disturbances (search- 
ing for {h}^=i) and numerically insensitive to the plant model uncertainties 
(searching for 

6.7.2. Complete observer gain 

Let Ln = [ h • • • /^ ] G denote the left modal matrix with columns 

composed of the attainable left eigenvectors {h}2=i associated with a given 
set of eigenvalues of the observer state matrix. The observer gain 

Ko G is now (compare (6.20)) determined from 

Ko = -{VnL-Y, (6.23) 

where Vn = [ vi • • • Vn ] ^ £mxn jg of the sub-space parametrising 
vectors. 



6.8. Completely decoupled observers 

In this section we assume that it is possible to achieve complete decoupling 
between the residue yw{k) and the disturbances dd{k), i.e., zeroing xi^w)- 
We start our deliberations by giving the conditions for this decoupling under 
the supposition that at least w eigenvalues of the observer state matrix 
can be nullified. As is commonly known, such eigenvalues are represented by 
dead-beat modes in the transient time-domain characteristics of the observer. 
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6.8.1. Dead-beat design of residue generators 

The following lemma can be easily derived by virtue of the sufficient condition 
(c2). 

Lemma 6.6. (Sufficient condition for complete decoupling with a dead-beat 
observer) To satisfy the condition (c2) it is sufficient that the following two 
equalities hold: 

(c3) HBd — Oyjxud: H Aq — Owxrif 

with the first one standing for the necessary condition'^ , and the second one 
for a ^strong’ sufficient condition. 

As a commentary to the above lemma, it is worth noticing that the as- 
sumption about the diagonalisability of Aq has not been exploited. Suppose 
that there exists a matrix H = WC such that Lemma 6.6 holds. By virtue 
of the assumptions made for W and C, we have rank (H) = w. Hence from 
(c3) it follows that all rows of H should be arranged from the attainable 
(and linearly independent) left eigenvectors of the observer state matrix Aq 
coupled with its zero eigenvalues. The equality H = is thus the necessary 
condition for the existence of the solution. 

Note also that the same equality appears as a sufficient condition (6.13) 
for disturbance decoupling in Lemma 6.2, where the strong assumption 
about the eigenstructure Aq is obligatory. Moreover, it is obvious that if 
solely diagonalisable matrices with zero eigenvalues are considered 

and the complete matching of H and is achievable, then the equality 
HAo = Owxm which establishes the necessary condition of the matching, 
holds assuredly. Thus for a diagonalisable Aq with zero eigenvalues 
the conditions HAo = Owxn and H = Lj are equivalent. It is also clear 
that for any diagonalisable matrix with non-zero eigenvalues {Ai}^i the 
complete matching H = Lj does not induce the equality HAq = Owxnj 
since HAo = diag{Ai}^^i?. 

If (c3) holds, we have 

HTo{z) = z~^H. (6.24) 

Because this matrix product appears in the formulae (6.5) and (6.7), describ- 
ing the transfer functions Grf{z) and Gm{z)^ they take the pretty simple 
forms 

Grf{z) = WF + z~^H{E - KoF), Gm{z) = WD^ - z~^HKoDn. 

For a stable matrix transfer function G{z)^ the norm \\G\\^ can be 
defined as 

Halloo = sup a{G{e^^)), 

6e{-7r,7r] 



^ See the condition (6.12) in Lemma 6.2. 
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where a{-) is the spectral matrix norm, i.e., the largest singular value of the 
matrix argument (Green and Limebeer, 1995; Weinmann, 1991; Zhou et al.^ 
1996). The function a{G{e^^)) of ^ G [0,7t) can be interpreted as a certain 
extension of the standard amplitude Bode characteristics of a scalar (SISO) 
plant onto the domain of multi-variable (MIMO) dynamic systems described 
by the matrix transfer functions G{z). 

By virtue of the above it can easily be verified that each entry of the 
matrices Grf{z) and Gm{z) describing the infiuence of particular signals 
(faults or noise) on a residue has the following general form of a rational first 
order function: {ao + aiz)/z. Consequently, 



= max{|ao + ai|, |ao - ai|}, 

oo 

where for |ao + o;i | > |ao — ai | we obtain the corresponding low-pass spectral 
characteristics (i.e., a decreasing function in 0), while for |ao+Q'i| < |<^0 “CKi| 
we have spectral characteristics of a high-pass property (i.e., an increasing 
function in 9). 

It can be interesting to inspect the way in which the remaining eigen- 
values {K}Z=w-\-i observer state matrix and their corresponding at- 
tainable left eigenvectors effect the analysed transfer functions 

Grf{z) and Grn{z). This question can be answered by obtaining an appar- 
ent form of the matrix product HKq, which appears in both transfer func- 
tions. From (6.13) we conclude that H = [ Onx(n-w) ]^n next, by 
virtue of (6.23), that HKq — This means that the discussed two de- 

grees of design freedom have no infiuence on Grf{z) and Gm{z)- Therefore, 
in this case, the parameterisation of the eigenstructure of the state matrix of 
a dead-beat observer concerns solely the measure of the robustness aspect of 
the design because we should simply ensure a possibly maximal proximity of 
Ln to an orthonormal matrix. 

It is also worth noticing that by increasing we can improve, 

to some extent, the conditioning of the resulting left modal matrix This, 
however, can be done at the cost of slowing the observer’s speed, since such 
de-tuned eigenvalues are located further from the origin, i.e., from the place 
of the time-optimal dead-beat modes. On the other hand, an opposite proce- 
dure leading to lower eigenvalues and a prompter reaction of the 

observer can also slightly improve the detecting abilities of the output error 
signal 2/e(fc), as shall be demonstrated in the next section. 

As can be deduced from the above discussion, in the case of the ‘op- 
timal’ dead-beat observer the transfer functions Grf{z) and Grn{z) take 
certain ‘resultant’ forms, i.e., they are not tuned in any way. Thus a sound 
design procedure should be equipped with a mechanism by means of which 
these transfer functions are verified taking into account important aspects 
of detection quality. To make this verification practical, certain numerical 



ao -h aiz 
z 
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measures are necessary for the quantitative qualification of relevant design 
requirements. 

Let us first define the following relative scalar index of measurement 
noise attenuation: 



'^n|f — 



ll^rnlloo 



(6.25) 



which is clear from a pragmatic point of view, since a low-frequency model 
of fault signals and the worst case of (broadband) measurement noise are 
usually considered. 

In view of the previous developments concerning the dead-beat observer 
under discussion we obtain 



^ \\WDn - Z-^HKoDn\\oo 
\\WF + H{E-KoF)\\ ■ 

It is clear that such indices can be defined for each coordinate of the residual 
vector. Note that in the case of a particular scalar ‘diagnostic channel’, the 
required computations are of an elementary character, the fact which one 
can see from the above-given example of infinite-norm evaluation for the 
first-order rational transfer function. 

Obviously, the index r]n/f of (6.25) can also be used in the evaluation 
of the quality of partially decoupled observers. In such a case, however, an 
additional criterion is necessary to indicate the degradation of the observer 
detecting properties. We believe that, apart from the index® of mismatching 
x{L>w) being of a ‘purely numerical’ nature, the following frequency-domain 
index of disturbance decoupling can be applied: 



^d/f^ 



\\Grd\L 

^{Grf{e^^))\e=o' 



which is also based on the worst-case principle with respect to plant distur- 
bances. For completely decoupled observers we certainly have rjd/f =0. 

Finally, let us discuss the regular condition of the low-pass models of 
faults and rather high-pass frequency characteristics of measurement noise, 
which leads to the postulate that the transfer functions Grf(z) and Grn{z) 
being designed are both low-pass with a sufficiently big value of ||Gr/l|oo 
and a small value of ||Grn||oo- Most often, both requirements can hardly 
be satisfied simultaneously. For example, when dealing with a fault of the 
sensor by letting E = Onxm and F = we have Grf{z) = Gm{z)Dn- 
Thus small ||Gm||oo excludes ‘good’ ||Gr/|loo- For low-pass measurement 
noise not much can be done, whereas with high-frequency noise we certainly 
should shape Gm{z) so as to acquire sufficient attenuation for the respective 
high-frequency band. On the other hand, while considering a fault in the 
controlling channel that can be modelled by setting E = Bu and F = Du, 
in the particular case of Du — Omxuu we obtain Grf{z) = z~^WCBu of a 
constant magnitude. 



8 



Introduced in the previous section. 
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6.8.2. Residue generation using parity equations 

Let us discuss a meaningful case in which the influence of both the measure- 
ment noise and the uncertainty of the initial conditions of estimation can be 
neglected, i.e., dn{k) = Onr, and Xe{k) = On, Vfc, respectively. Let also the 
assumptions of Lemma 6.6 hold. By virtue of (6.5) and (6.24), the following 
time-domain representation of the residue yw {k) can be derived (Kowalczuk 
and Suchomski, 1998): 

y^{k) = WFf{k) + H{E - KoF)f{k - 1). 



The corresponding 2 :-domain representation takes the form 

yj^) = (W- z-^HKo)y{z) - [WDu + - KoDn)] u (z), 

which directly leads to the following effective non-recursive algorithm of a 
first-order parity equation (see Chow and Willsky, 1984; Gertler, 1998; Lou 
et al, 1986; Patton and Chen, 1991b): 



y^k) = [ -HKo W ] 



y{k - 1) 
y{k) 



- [ H(Bu - KoDu) WDu 



u{k — 1) 
u(k) 



(6.26) 



6.9. Numerical example 

Let us assume that the observed stable discrete-time plant of (6.1) and (6.2) 
is parameterised as follows (Kowalczuk and Suchomski, 1998; Liu and Patton, 
1998; Patton and Chen, 1991a): 
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The plant is governed by a state-feedback controller u{k) = —Kcx{k) 
with the gain Kc = [ —64.9864 1.6244 3.6423 ] tuned so as to assure the 
following set of closed-loop poles: {0.7165,0.7165,0.7165}, which with the 
assumed sampling period To == 0.1s correspond to a time constant of 0.3 s. 
The resulting r = rank {CBd) = 1 and w = m — r = 1 lead to a scalar 
residue signal yw{k) G E. 
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6.9.1. Decoupled dead-beat residue generator 



A decoupled dead-beat observer is to be designed for the required zero eigen- 
value Ai = A2 = 0 of the maximally admissible multiplicity (m = 2) and an 
exemplary single eigenvalue A3 = 0.4. By using the proposed eigenstructure 
assignment techniques we obtain the following results: 



• the weighting matrix corresponding to the optimal parameter — 

-0.9129: 

W = [ 0.4082 -0.8165 



• the parameters of the attainable left eigenvectors of the observer state 
matrix: 

^ _ -0.1021 I 0.2085 -0.0589 

" “ 0.3062 I 0.0569 -0.0140 ’ 

• the attainable left eigenvectors (aggregated in the form of the left modal 
matrix): 





0.4082 1 
1 


-0.8339 


-0.3925 


Lfi — 


-0.4082 


-0.5307 


0.7290 




-0.8165 


-0.1516 


-0.5608 



• the observer gain: 



Ko 



0.1917 -0.0833 
0.1167 0.1667 

-0.0875 0.2500 



• the observer state matrix: 



Ao 



0.0583 -0.1083 0.0833 

-0.1167 0.2167 -0.1667 

0.0875 -0.1625 0.1250 



It can be easily verified that 



rTA t-T _ 



0.0000 0.0000 0.0000 
0.0000 0.0000 0.0000 
0.0000 0.0000 0.4000 



which means that the resultant state matrix Ag has the assumed eigen- 
values. This also means that the complete disturbance decoupling has been 
achieved since WCBd = 0 and the disturbance decoupling index xi^w) = 
2.1484 X 10“^® has, in practice, a null value. A similar almost zero effect is 
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apparent from the computed necessary condition of disturbance decoupling: 
HAo [ 0.0680 -0.0227 -0.1020 ] x 10“^®. The resultant left modal 
matrix, being almost perfectly matched with an orthonormal matrix, is well 
conditioned with K{Ln) — 1.0258. 

Moreover, we have the transfer matrix 

- 0.3417 -0.1083 0.0833 

-0.1167 z- 0.1833 -0.1667 

0.0875 -0.1625 z - 0.2750 

As 0.4082 -0.4082 -0.8165 , it follows that (6.24) is 

satisfied: 

HTo{z) = z-'^ [ 0.4082 -0.4082 -0.8165 ] . 

Consequently, the parity equation (6.26) generating yw{k) takes the 
following form: 

y^{k)= -0.1021 0.3062 ' 0.4082 -0.8165 +0.4695 u(fc-l). 

L ' J [ y{k) 

6.9.2. Properties of dead-bead observers 

In this subsection two separate types of faults will be examined. The first 
case of model faults concerning the system controlling channel (actuators) is 
characterised by E — Bu, F = Du and by the following particular signal 
/(t) G M of a temporary deterministic fault: 

0 for t < 100 s, 
f{t)=l 0.5 for 100s<t<120s, 

0 for t > 120 s. 

In the other case we shall check the observer capabilities with reference to a 
fault in the system measurement channel (sensors), modelled by E — Onxm 
and F — Im-, with an exemplary deterministic two-dimensional signal f{t) — 
[ h{t) f 2 {t) G described by 

0 for ^ < 80 s, 

= l 0.5 for 80s<^<140s, 

0 for ^ > 140 s. 



0 ior t < 100 s, 

/ 2 (^)=< 1 for 100s<t<160s, 
0 for ^>160s. 
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A respective simulation experiment has been performed, including the 
following models of plant disturbance and measurement noise: 

• the disturbance dd G E, which is a Gaussian process obtained by 
shaping a prototype zero-mean white-noise process of dispersion 1.5 with the 
aid of a first-order filter of a time-constant of 3 s; 

• the measurement noise dn G M, composed of two Gaussian processes 
gained by filtering a prototype zero-mean white-noise processes of dispersion 
0.1 via first-order filters of time-constants of 2 s. 

The results concerning the assumed faults in the actuator and sensor 
channels are presented in Figs. 6.1 and 6.2, respectively. The plots denoted 
by (a) and (b) represent the output errors ye{k) G the plots marked with 
(c) and (d) illustrate the filtered output-errors ye{k) G obtained by a 
standard averaging technique, based on a twenty-sample window, while the 
plot indicated with (e) represents the weighted residue yw{k) G E. We ob- 
serve that the output error signal ye (k) exhibits rather weak detecting com- 
petences. A similar remark concerns the detecting capabilities of the filtered- 
error signals ye{k), which could be motivated by a naive idea based on the 
hope that some symptoms of faults should be more visible in the smoothed 
error signals ye{k) - even at the cost of a delay in the detection of the fault. 

It is worth noticing that the ‘exclusive’ problem of fault detection has 
been solely examined here because the ‘isolation structure’ of the residues is 
not the subject of our interest. 

The noise channel of the residue generator is characterised by the transfer 
matrix Gm{z) = WDn — z~^HKoDn, which in the analysed case takes the 
following form: 



Grni,^) Grnl{^) j 



Qoi + CKii^ ao2 Oil2Z 



It can be easily verified that 



ll^rnlloo ~ max 1 (q;oi OLhY + (o ^02 + <^ 12)^5 
\/ (aoi - aii)2 -f (ao2 - 

Moreover, we observe that the frequency characteristic is of 

a low-pass nature (a decreasing function in 6) if (aoi +<^ 12 )^ + (<^02 > 

(aoi — o^ll)^ 4- (ao 2 — 0 ^ 12 )^- On the other hand, if this inequality takes the 
opposite sign we acquire the spectral characteristic of a high-pass property 
(an increasing function in 9). Taking into account the numerical details of 
the object under examination yields the following result: 






-0.1021 0.4082;^ 0.3062 - 0.8165^ 
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It is also obvious that all of the analysed spectral characteristics of the 
transfer function Grn{z), i.e., d-{Grn 2 {e^^)) = 

\Grn 2 {e^^)\, and ^{Gm{e^^)), are high-pass functions (confer Fig. 6.3). More- 
over, the applied indices ||Grni||oo = 0.5103, ||Gm 2 ||oo = 1.1227, and 
= 1.2332 testify that the examined measurement channel is domi- 
nated by its second noise sub-channel. 




Fig. 6.3. Spectral characteristics of a decoupled dead-beat observer 



By taking into account the fault in the actuator channel and proceeding 
in a similar fashion with respect to the principal task of the residue generator, 
we obtain the transfer function Grf{z) = WF + z~^H{E - KqF) of (6.5) in 
the following form: 



Grf{z) - 



-0.4695 



Hence, the detecting attribute of the observer referring to the controlling 
channel faults is characterised by a{Grf{e^^)) |6»=o= |G^r/(l)| = 0.4695, while 
the measurement noise decoupling index amounts to rj^/f = 8.39dB. 

When considering faults in the sensor channel, we derive the previously 
obtained form of the fault-effect transfer function: 



Grf{z) — 1 ^ Grfl{z) Grf2{z) ] — Grn(z), 



for which the noise-to-fault index equals rj^/f = 6.33dB. 

The plots shown in Fig. 6.3 reveal that one should expect some detect- 
ing abilities of the residue generator to be deteriorated by wide-band (high- 
pass) measurement disturbances. This claim is also exemplified by the plot of 
Fig. 6.4, where the residue signal responding to the controlling channel fault 
is detected in the presence of measurement noise generated with the aid of 
wide-band filters of time constants of 0.05 s. 
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Fig. 6.4. Residual signal in the presence of wide-band measurement noise 



6.9.3. Properties of non<dead-bead observers 

Let us re-consider the detection problem for the fault in the controlling chan- 
nel. It seems to be interesting to explore the possibility of shaping the observer 
properties by tuning its double eigenvalue (Ai = A2) - assuming, at the same 
time, that the third eigenvalue stays unaltered, A3 = 0.4. Taking a non-zero 
Ai we deliberately give up the maximal speed of fault detection - hoping for 
an improvement in measurement noise attenuation. 

Two exemplary values of Ai are taken into account, namely, Ai = 
0.325 and Ai = 0.65. Both of them lead to the residue-generator out- 
put completely decoupled from the plant disturbance dd{k), as we have 
x{L>w) = 3.4805 X 10“^® and xi^w) = 5.2533 x 10“^®, respectively. What 
is more, also the corresponding left modal matrices Ln are well conditioned 
as K{Ln) = 2.7577 and hi{Ln) = 1.8857. Since Ai ^ 0, we should expect 
a non-zero HAq and, indeed, HAq = [ —0.1327 0.1327 0.2654 ] and 
HAo = [ 0.2654 -0.2654 -0.5307 ], respectively. The spectral character- 
istics a{Grf{e^^)) and d{Gm{e^^)) are depicted in Fig. 6.5, where the previ- 
ously obtained results concerning the decoupled dead-beat observer (having 
Ai = A2 = 0) are given for comparative purposes. 

As one could expect, both of the functions a{Grf{e^^)) are of a low-pass 
form, and therefore ||Gr/||oo = d-{Grf{e^^)) While increasing the eigen- 
value Ai we observe that the norm ||Gr/||oo takes larger values, which can be 
interpreted as an advantageous phenomenon provided that low-pass signals 
are adequate models of possible actuator faults. A rational interpretation con- 
cerning the transfer function Gm{^) appears to be more complex. Taking, 
for instance, a greater Ai implies the required attenuation of a{Gm{^^^)) 
at higher frequencies. At the same time, however, there is a disadvantageous 
increase in this spectral characteristic for lower frequencies, which means 




250 



Z. Kowalczuk and P. Suchomski 




Fig. 6.5. Spectral characteristics of decoupled 
observers: (a) a{Grf{e^^))j (b) d-(Grn(e^^)) 



that the resultant observer is more sensitive to low-pass (drift) measurement 
disturbances. 

The results presented above clearly confirm the claim that designing de- 
tection observers is principally a multiobjective optimisation task, in which 
more that one particular viewpoint should be effectively taken into account. 
In the case of Ai = 0.325 we have ||C?r/||cx 3 = 0.6955 and ||Grn||oo = 0.9307, 
which imply rj^/f = 2.53 dB, whereas taking Ai = 0.65 gives ||Gr/||oo = 
1.3410 and ||Grn||oo = 1.7000, and, consequently, rj^/f = 2.06 dB. The prof- 
its we obtain are thus not so significant, which can be seen from the residues 
given in Fig. 6.6. 

1 



0 



-1 

0 50 100 150 200 0 50 100 150 200 

(a) (b) 

Fig. 6.6. Residue signals in the presence of wide-band 
measurement noise: (a) Ai = 0.325, (b) Ai = 0.65 
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After a close examination of the plot in Fig. 6.6(b) one can ascertain a 
clear delay in the respective fault-detection process as a consequence of the 
non-dead-beat solution. Obviously, increasing Ai will intensify this tendency. 

While accepting that certain trade-offs are often to be taken into ac- 
count, in the next chapter an additional optimisation approach shall be pre- 
sented that, based on filtration with respect to the Hqq norm, renders new 
perspectives for dealing with residue-generator design problems. 



6.9.4. Robustness of dead-beat observers 



Concluding this illustrative section, let us examine the robustness property of 
decoupled dead-beat observers. Taking into account two observers with differ- 
ently conditioned modal matrices, we shall compare the previously analysed 
observer obtained for the set of eigenvalues {Ai = A2 = 0, A3 = 0.4} with an 
observer designed for Ai = A 2 = 0 and a ‘fast’ eigenvalue A 3 = 0.05. The 
gain of this completely decoupled observer (having xi^w) = 2.1484 x 10“^®) 
takes the form 



Ko 



-0.1583 -0.0833 
0.8167 0.1667 

-0.6125 0.2500 



However, its left modal matrix. 





0.4082 1 


-0.8339 


-0.8476 


Lfi — 


-0.4082 1 

1 


-0.5307 


-0.5016 




-0.8165 1 
1 


-0.1516 


-0.1730 



is characterised by the condition number ^{Ln) = 51.8668, which is over fifty 
times worse. 

To indicate the negative effect of such a significant growth of this index 
of numerical robustness, we can examine the time-domain properties of the 
resultant residue. A simulation experiment was arranged as follows: in two 
cases the observer gain was imperfectly implemented (simply the entries of 
Ko were subject to unstructured uniformly distributed multiplicative per- 
turbations of 1% intensity), and then the residue signal was computed. Some 
exemplary simulation results are presented in Fig. 6.7. 



6.10. Summary 

The problem of the parameterisation and suitable representation of the at- 
tainable left eigenspace of the observer state-matrix with a fixed set of real 
eigenvalues has been discussed in detail. Being linear-in-the-principles, pa- 
rameterisation serves as a useful basis for the proposed synthesis of state 
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(a) (b) 



Fig. 6.7. Residues generated by detection observers with a perturbed 
gain: (a) well-conditioned solution, (b) ill-conditioned solution 



observers, optimised with respect to specific criteria imposed by their appli- 
cation to the generation of residual vectors within diagnostic systems. These 
criteria principally concern two required designing tasks. The first one refers 
to the static decoupling of residues generated by the observer from the dis- 
turbances, which act on the monitored plant, while the other one concerns 
observer robustness, which manifests itself in a high quality performance of 
the system even when some uncertainty of the plant model is present. 

The main idea of the discussed approach lies in the appropriate param- 
eterisation of a sub-space attainable for the left eigenvectors of the obser- 
vation state-transition matrix. The proposed consequent design is based on 
the possibility of the analytical determination of these vectors. In a complete 
design procedure we observe the following steps. First, the eigenvalues are 
optimised taking into account all the necessary criteria of fault detectability, 
disturbance and noise attenuation, and certain parametric robustness. And 
second, the associated eigenvectors are analytically shaped, which secures the 
orthogonalisation or minimal conditioning of the corresponding modal ma- 
trix (as a measure of system parametric robustness to modelling errors). It 
is also possible to optimise eigenvalues for the best fault detectability and to 
design system eigenvectors analytically (Chen and Patton, 1999; Kowalczuk 
and Suchomski, 1998; 1999) based on the conditions of disturbance decou- 
pling. This approach is definitely better than the ^^oo-design, which does not 
generally lead to completely decoupled systems (Chen and Patton, 1999). On 
the other hand, if at the designing stage of disturbance decoupling the ideal 
conditions cannot be obtained, then only approximate decoupling, restricted 
by the attainable eigensubspace, can be performed. 

For the sake of the effectiveness and clarity of the presentation, singu- 
lar value decomposition (svd), which is an effective numerical tool for ob- 
taining fine operative results has been utilised. In case of implementational 
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constraints concerning the computation time, it is always possible to employ 
another decomposition, such as the QR algorithm (Golub and Van Loan, 
1996; Qarteroni et al, 2000; Stewart, 2001), which is radically cheaper in the 
numerical sense. 



Appendices 

A. Observability of dynamic systems 

Two standard results concerning the observability of discrete-time dynamic 
plants are given below. 

Lemma 6.7. (Some facts about observability) Consider the following 
discrete-time model of a linear dynamic plant: 

x{k 4- 1) == Ax{k), 
y{k) = Cx{k), 

where A G and C G . The pair {A,C) is said to be completely 

observable if, for any ^ < kf < oo, the initial state x{0) G BP can be de- 
termined from the time history of the output y : [0,kf] B !^ . The following 

statements are mutually equivalent: 

(a) The pair {A,C) is completely observable. 

(b) The matrix 



k 

Wo{k) = {A^yc'^CA^ e M”''” 

i=0 

is positive definite, \fk > 0; Wo{k) > 0. The observability Gramian, 
defined for asymptoticaly stable plants by 

Wo = (A^yC^CA‘ € 

is positive definite, Wo > 0. 

(c) The observability matrix 



Mo = 



C 

CA 



CA 



n-l 



has a full column rank: rank (Mg) = n, which means that Ker(Mo) = On- 
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(d) For all A G M; 



rank 



XIn -A'^ ] = n. 

(e) The pair {A, C) is not similar to any pair {A, C) of the following form: 

^11 ^{n—rio)xno 

A21 A22 

where 1 < Uq < n. 

(f) If for all A G M the following equation: 



A = 



5 ^ I^mXrio ’ 



C{XIn-A)-^V=0m. VeR^, 



holds, then v — On- 

(g) Let A G A (A) and G be the corresponding right eigenvector of A, 
i.e.j Av = Xv; then Cv ^ Om- 

The above can be easily generalised to cover the field C of complex numbers, 
i.e., by letting A G C and v E . 

Lemma 6.8. (On the null sub-space of the observability matrix) Consider a 
pair {A,C) with A G R^^'^ and C G The null sub-space 

n— 1 

Ker (Mo) = f| Ker (CA*) C E" 
i=0 

of the observability matrix Mq G of (A,C) is an A-invariant sub- 

space, i.e., Vu G Ker(Mo) C E^ implies Av G Ker(Mo). 

B. Useful geometric relationships 

Let I G Im (5) C E^ , where 5 G E^^® is a given matrix of a full column rank 
rank (5) = s < n. It follows that I Sv for the corresponding u G E^ . An 
angle /.{l,lm{Q)) between I and a given sub-space Im(Q) C E”^, described 
by a full column rank matrix Q eR^^^ of rank((5) = q < n, is determined 
as (Petkov et al, 1991; Weinmann, 1991): 

cos (Z(/,Im(Q))) = 

where Pim{Q) G E^^^ is a matrix of orthogonal projection on Im((5). 
Consider now the following problem: 

VQ = argmax ||Pim(Q)5t;|| 
ueR” 



||St,||=l 



(6.27) 
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of searching for a vector 

Iq - SvQ e Im (5) 

of the unity length, which is characteristic of the minimal angular distance 
to the given sub-space Im((5). The above-mentioned projection matrix can 
by functionally obtained via the svd of Q. Since we have assumed that Q is 
of a full column rank, the matrix can be shown in a factorised form: 



Q = UqHqV^, 

with orthonormal matrix factors Uq G and Vq G , while 

= [ Sg Oqx{n-q) V ^ , where Eg G denotes a non- 

singular diagonal matrix containing all singular values of Q. Taking a parti- 
tion Uq = [ U_Q Uq ] with U_Q ^ and Uq G we obtain 

Plm{Q) = HqUq. 

Similarly, the svd of S leads to the following: 

5 = Us^sVj, 

where Us = [ U_s Us ] ^ MP^'^ and Vs G are orthonormal, U_s ^ 
]gnxs ^ C /5 G , whereas E5 = [ E5 Osx(n-s) V ^ contains a 

non-singular diagonal sub-matrix E5 G . Hence we acquire the following 
representation of ||i|p: 



\\l\f = \\Svf=v^Vs^lV^v. 

Letting h = 'LgVjv G W leads to the following equivalent form of the 
above problem (6.27): 

/iQ = arg max (6.28) 

Thus, by considering the svd of the auxiliary matrix M = UqU^Us^ 
£nxs exposed above, we conclude that the first right singular vector vm,i G 
W of this matrix, i.e., the right singular vector associated with the largest 
singular value ctm,i of represents a solution to (6.28). The necessary 
svd of M has the standard form of M = Um'^mV^, where Um ^ 
and Vm = [ vm,i ^ orthonormal, Em G de- 

notes the corresponding block diagonal matrix containing the singular val- 
ues {oTMj}j=i of M. It can be easily shown that \\Mh\\ = HEm^II, where 
h = V^h G E^. By assuming that the singular values of M are ordered 
non-increasingly, i.e., (Jm,i = we immediately infer that the 

length of Em^ reaches its maximum for /i = /ii = [ 1 0 • • • 0 G E®. 
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Consequently, we obtain the sought optimal solution as hq — Vm/^i = 

On the basis of the above, we can present the following lemma. 

Lemma 6.9. (Vector of a minimal slope to a given sub-space) Problem (6.27) 
always has a solution, which can he determined as folows: 

yq = Vs'^s^vm,! and Iq = UsVm,i- 

Note that in the above statement we have conveniently used the following 
orthonormal bases of the analysed column sub-spaces: Im (5) = Im (ILs) and 
Im (Q) = Im (C/q), respectively. 

Finally, let us consider two extreme cases concerning the angle between 
a vector and a sub-space, discussed below. 

Minimal slope. It is clear that Z{lq,lm{Q)) = 0 if and only if 
Pim{Q)f'Q — which is the case when Iq G lm{Q). A necessary and 
sufficient condition for /.{lq,lm{Q)) = 0 takes thus the following form: 
dim(Im ((5)nlm (5)) > 0. Certainly, this inequality holds if Im (Q)nlm (5) 7 ^ 
{On}, or, particularly, if Im(5) C Im((3) or Im(Q) C Im(5). On the 
other hand, it follows that if rank[ Q S ] = q + s, the minimum slope 
/.{lq,lm{Q)) = 0 cannot be reached. An orthonormal basis of the intersec- 
tion Im (Q) n Im (5) can be effectively obtained via the svd of the analysed 
matrices. By virtue of the evident identity 

Im((5) nlm(5) = (lm((5)-^ + = Im([ Uq Us J)''', 

which uses the internal’ partitioning applied to the products of svd 
within (6.27)-(6.28), we have Im(( 5 ) film (5) = lm{Uqs), where Uqs G 
l^nx(n-nQs) denotes another matrix partition found in an appropriate prod- 
uct of the svd of the matrix [ Uq Us ] ^ E^x((^- 9 )+(n-s)) ^ having the rank 

nqs = rank([ Uq Us ])• Namely, by assuming that nqs < n, we have the 
decomposition [ Uq Us ] = [ Uqs Uqs ]^QsVqs with Uqs G ^ 

^qs G and Vq 5 G E((^-^)+(^-"))x((^-^)+(^-^)) , where the 

required basis of Im (Q) H Im (S) can be arranged by the set of columns of 
Uqs- 

Maximal slope. It holds that ^{lq,lm{Q)) = tt/ 2 if and only if 
Pim (Q)l = VZ G Im (5), which is equivalent to the equality U_qUJqlQ — On- 
This happens if and only if Im (5)±Im (Q), i.e., if Im(5) C Im((5)-^, or, 
equivalently, if Im (Q) C Im Therefore, reaching the maximum slope is 
possible only when q + s <n. 
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Chapter 7 



ROBUST Hoo-OPTIMAL SYNTHESIS 
OF FDI SYSTEMS 

Piotr SUCHOMSKI*, Zdzistaw KOWALCZUK* 



7.1. Introduction 

The entirety of fundamental engineering issues (Chen and Patton, 1999a; 
Frank, 1990; Gertler, 1998; Isermann, 1984; Patton and Chen, 1993; Wilsky, 
1976) is identified with the technical diagnostics of dynamic plants and pro- 
cesses. This domain encompasses three basic subdomains of diagnostic proce- 
dures known as the detection ^ isolation and identification of faults. Therefore, 
they are referred to as FDI (Fault Detection and Isolation). 

As a primary basis for processing diagnostic decisions, the detection of 
faults appearing in a monitored dynamic system (plant) can be founded on 
the analysis of properly defined residues . These residues are generally defined 
as weighted errors of (multidimensional/ vector) output estimates of the plant 
generated via a properly designed state estimator (Chen and Patton, 1999a; 
Chow and Willsky, 1984; Kowalczuk and Suchomski, 1998). 

Such a detective estimator (or a state observer) should clearly be char- 
acterised by possibly high competence in indicating suitable information on 
the symptoms of faults, which may occur in the diagnosed object (e.g., faults 
in the measurement channel) in the generated residues. On the other hand, 
an applicable FDI estimator should be equally insensitive (robust) to several 
factors, such as disturbances and noise acting on the plant, a structural and 
parametric uncertainty of the plant model, and the existing measurement 
noise, such as sensor disturbances (Kowalczuk and Suchomski, 1999; Magni 
and Mouyon, 1994; Patton and Chen, 1991). 
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It is worth emphasising at this point the specific nature of FDI-design 
tasks of the synthesis of robust residue generators as distinct from other ap- 
proaches to the standard problem of robust estimation, where only particular 
measures of the robustness of the obtained plant-state estimates with respect 
to model uncertainties are taken into account. Model uncertainties are some 
differences between the nominal model of the estimated plant (used during 
estimation-system synthesis) and the real-plant model (valid in the current 
time of the life of the estimation system) . 

It is thus a legitimate opinion that the problem of the optimisation of 
residue generators has a principally multi-objective character, and in any 
rational procedure of residual optimisation many aspects of the quality of a 
complete FDI system have to be taken into account. 

In recent years an increasing interest in ifoo-based methods for control, 
estimation and FDI system synthesis has been observed (some relevant defi- 
nitions and basic facts on can be found in Appendix B to this chapter) . 
The i^oo-space norm is an induced norm with respect to quadratic norms of 
the input and output signal spaces. Thus it characterises important input- 
output properties of a given dynamic system in a convenient synthetic way. 
It also yields very effective solutions to many practically important mini- 
max optimisation issues, known as the problems of worst-case optimisation 
(Chen and Patton, 1999a; 2000; Edelmayer et al.^ 1997a; 1997b; Edelmayer 
and Bokor, 1998; Patton and Hou, 1997). 

The particular significance of the above-mentioned iJoo -space norm fol- 
lows from the fact that the input-output properties of optimised systems are 
described in terms of the commonly accepted norms of a practical ‘energetic’ 
connotation. With this, we allow convenient interpretations of the results of 
optimisation with reference to easily resolvable spectral properties of the de- 
signed system. A disturbance of an unknown nature is simply characterised 
as the most difficult (troublesome) distribution of its spectral power densi- 
ty with respect to the quadratic norm of the output signal of an optimised 
system. That is what is generally recognised as the worst case for any energet- 
ically oriented goal of optimisation tasks. It is important that Hoo-optimal 
methods do not require any knowledge about probability density functions of 
stochastic disturbance signals influencing the object under state estimation. 
Such knowledge is, in general, required in numerous optimal design methods. 
For example, in the Kalman filtering approach we need to assume particu- 
lar covariances of the disturbances, preferably of the Gaussian type (Gertler, 
1998; Kowalczuk and Suchomski, 2001; Mangoubi, 1998; Wilsky and Jones, 
1976). Thus the above-mentioned property of iJoo-space approaches allows a 
convenient ‘unified’ treatment of disturbance signals of any nature (stochastic 
and deterministic). 

Approaches based on the iJoo-norm have turned out to be computation- 
ally/numerically effective. A state-space formulation of the corresponding 
optimisation problem (Doyle et a/., 1989) is commonly recognised as a turn- 
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ing point in the development of this approach of great practical importance 
within the system synthesis framework. 

Another significant attribute of iJoo-based methods of the optimal syn- 
thesis of FDI filters is their unified approach to the issue of both external^ 
and internal^ uncertainties of the monitored object (Chen and Patton, 1999a; 
1999b; Chen et al, 1999; Edelmayer and Bokor, 1998; Patton and Hou, 1997; 
1999). 

In this chapter, having an introductory character with respect to Hoo 
methods, only the foremost, external aspect of optimal residue-generator 
synthesis shall be considered. Two particular model-relevant approaches to 
the problem of disturbance de-coupling will be presented. The first one con- 
cerns the basic model of the monitored dynamic object, while the other ap- 
proach employs a corresponding dual model. In both methods the optimisa- 
tion of residues requires only one discrete-time algebraic Riccati equation to 
be solved. 



7.2. FDI design task as optimal filtering in Hoo 

Let the monitored discrete-time plant P be described as 

p f Xp{k -\r 1) = ApXp{k) + Bpdp(k) + Epfp(k), ^ 

1 yp{k) - CpXp{k) + Dpdp{k) -h Fpfp{k), 

where Xp G denotes a state, yp G is a measured output, dp G W 
stands for an unmeasurable plant^ disturbance, while fp G W represents 
faults. All matrices of (7.1) are properly dimensioned: Ap G Bp G 

E^''^ Cp G Dp G E"^^^ Ep G and Fp G For the 

sake of notational simplicity we do not consider any controlling signal in the 
above model P. It is clear that measurement noise can be easily modelled 
within the vector dp. According to a rational (transfer matrix) modelling 
principle presented in Appendices A and B, we shall utilise the following 
convenient representation of the plant of (7.1) with the two inputs dp and 
fp distinguished above: 



P{z) = [Pd{z) Pf{z)] 



Ap 


Bp Ep ] 


Cp 


1 

a, 

a 









p = q + v, 



Pd{z) = Cj,{zln - Ap) ^Bp + Dpe 



Pf{z) = Cp{zln - Ap) ^Ep + Fp € 

^ Expressed in terms of disturbance signals, and used for the purpose of de-coupling. 
^ Which can take the form of structural or non-structural perturbations. 

^ Joined system’s state and output signal perturbations. 
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The signal yp is processed by a detection filter K : yp r. The purpose 
of the filtration is to provide a residual signal r G , in which the information 
on the fault fp is optimally^ represented. As an objective of such optimisation 
we shall define the following vector of the error of the fault estimate (Fig. 7.1): 

e = fp-r eW . 




Fig. 7.1. Principal scheme of filtration 



Unmeasurable disturbances dp, influencing the residual vector r, make it 
difficult to expose information about the fault fp. This can generally degrade 
the detectability competence of the intended diagnostic system. Therefore, 
the main goal of residue-generator design is an optimal shaping of the vec- 
tor r so as to reduce the corrupting influence of disturbances and to increase 
the exposition of symptoms (information) about the faults. 



7.2.1. Optimal filtering based on the basic modelling 
of generalised processes 

Consider the standard scheme of filtering in Hqo shown in Fig. 7.2 which is 
suitable for the basic model G of a generalised dynamic object. 



w 

u 




Fig. 7.2. Residue filtering scheme with the basic model of the plant 



4 



To a certain extent. 
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Within the above process scheme we distinguish the following elements: 
• a generalised plant represented by its basic (primary) model G: 



w 




e 




-> 




u 




. y - 



• a feedback loop implemented in the form of a filter or residue generator 
K : y 

• an exogeneous unmeasurable signal vector 



w = 



dp 

fp 






• a correcting signal or residue vector u = r eV , 

• a criterion (objective, fault estimation error) vector e G , 

• a measurement or observation vector y = yp E . 



The generalised plant is described by the following state-space model: 



G: 



x(k + 1) = Ax(k) -h Bu)W(k) + Buu(k), 
^ e(A:) — C^x(k) -h D Qnjw(k) H- D^^Vj(k)^ 

^ y(k) ■= Gyx(k) + Dyy,w(k) H- Dyuu(k), 



where x — Xp denotes the state of this system, and 

A = ApE 

= [Bp Ep] e Bu = Onxv e 

Ce = 0„x„ Cy = CpeW^^^, (7.2) 

Den, = [Ovxq h] £ Dy^ = [Dp Fp] € ]R™xp, 

£>e« = -It, e Dyu = Omxv&V!^^'’. 

Due to the above, the operator model G € ^ (p+<^) ^ called a scattering 

transfer function matrix (Green and Limebeer, Kimura, 1995; 1997) of the 
analysed generalised process, can be obtained according to Appendix A as 



G{z) 



A 


B ' 


G 


D 



Gew(^) G eu(^) 
G y'U}(z) Gyu(z) 



-J z 



(7.3) 
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where 



S- [5„ -B„] C = 



Ce 

Cy 



€ 



5 



e X ^ 

and 

Gew{z) - [0.x, /.] e Geu{z) = G 

Gy^{z) = [Pd{z) Pf{z)] e Gyu{z) = Omxv e E-x^ 

On the basis of (7.3), the transfer function Few • e of the resulting 
closed-loop filtering system of Fig. 7.2, equipped with the filter K G 
can be easily shown to have the form 

Feu, = LF{G,K) = Geu, + GeuK[lm ~ GyuKY^Gyu^, 



D = 



De 

D 



yw 



Peu 

D yu 



which establishes a linear fractional transformation (Green and Limebeer, 
1995; Kimura, 1995; 1997) of the systems G and K of the closed-loop scheme 
under consideration. Furthermore, in view of (7.3), we can easily verify that 
this transfer function has an affine form with respect to the filter K: 



Feu, = Iv] - K[Pd Pf] e 



Based on the norms defined in Appendix B, we can formulate the following 
design problem. 



Problem 7.1. FDI filter synthesis with respect to Hoo- 

find \\LF{G,K)\\^<^. (7.4) 

In this problem we seek a causal, linear, time-invariant, finite-dimensional, 
stable and well-posed operator^ K : y u, K e RH^^, for which (Doyle 
et al, 1989; Zhou et a/., 1996): 

(i) the closed-loop system (Fig. 7.2) is internally stable, 

(ii) the inequality ||Fe^.||oo < 7 is ensured for some (possibly small) 7 > 0. 

The above norm bound is equivalent to the following evaluation of the crite- 
rion function®: 

Ikll2 ^ V Ii; G /2[0, 00) and some £ > 0, 



^ Represented here by a transfer function. 

® Being the fault estimation error, for instance. 
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when assuming the zero initial conditions x(0) == 0^ (Green and Lime- 
beer, 1995). 

The stated afSne task of optimising the residual vector is equivalent to 
the standard model-matching question (Doyle et al, 1989; Zhou et al, 1996). 
In the sequel, we shall show how the issue of seeking the solution K can 
be reduced to the problem of the existence and quality of some associated 
discrete-time algebraic Riccati equations (introduced in Appendix D). 



7.2.2. Solution using the basic model 



We shall start the presentation with a general form of the solution of the 
suboptimal Hoo filtering problem (7.4) stated in terms of the basic model 
of the generalised plant. Afterwards, detailed conditions for the existence of 
this solution will be given. Moreover, we shall discuss certain aspects of the 
pertinent FDI-formulation of the iJoo -filtering problem. 

With the above purposes in mind, let us make the following common 
assumptions concerning the basic model G G ^ of the generalised 

plant (Doyle et al, 1989; Green and Limebeer, 1995; Kimura, 1997): 



(cl) 

(c2) 

(c3) 

(c4) 

(c5) 

(w6) 



the pair (A, Bu) is stabilisable, 
the pair (A,Cy) is detectable, 

> 0 , 



rank 



rank 



A - e^^In 


Bu 


Ce 


Beu 


A - e^^In 


Bw 


Cy 


Byu) 



= n-\-v for V^G(— 7r,7r], 
— n-\-m for V0G(— 7r,7r]. 



The assumptions (cl) and (c2) are necessary for the existence of stabilising 
solutions to Problem 7.1. The loosening of the conditions (c3) and/or (c4) 
leads to singular filtering (control) problems. The postulates (c5) and (c6) 
are based on technical requirements - by dropping them we would make the 
solution very complicated. 

A general theorem (Green and Limebeer, 1995) concerning the suboptimal 
Hoq filtering/approximation/control problem (7.4) stated for the basic model 
of the generalised plant can be reformulated so as to match certain practi- 
cal/numerical aspects. Namely, this approach allows recasting Discrete-time 
Algebraic Riccati Equations (DAREs) into the form of a generalised eigen- 
value problem (see Appendices C and D). 



Theorem 7.1. (The existence of a solution to Problem 7.1) Suppose that 
the discussed basic model G G ^ (p+A gj ^ dynamic plant fulfils all 
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the conditions (cl)-(c6). Then a suboptimal filter K G RH^^, satisfying 
\\LF{G^K)\\oo < 7; exists if and only if 

(i) (Ux,Wx) e dom(DRic) and X = DRicfUx^Wx) > 0; where the re- 
spective submatrices^ of the pair {Ux,Wx) are given by 

P:, = A- B{D^J^D)-^D'^J^C, 

^ - D{D^J^D)-^D^)J^C, 

while^ 



= J,;p(7) e c = 



Ce 

Opxn 



g £(«/+p)xn^ 



D = 



Dew D eu 

Ip OpXV 






(a) Mxii < 0, where Mxii = Mxu-Mxi 2 M^ 22 ^Ii 2 ^ the Schur 

complement of Mxii € , being a submatrix of the following block matrix: 



Mx = 



Mxll Mx 12 
^Il 2 ^x 22 



= D'^J^D + B'^XB € !!(?+") ><(?+»’) , 



(Hi) (Ux,Wx) G dom(Z)Ric) and X = DRic{Ux,Wx) > 0^ where the ana- 
logue submatrices of the pair {Ux,Wx) are expressed by 

- C^{DJ^D'^)~^DJsB^, 

= BJs (js - D^{DJsD^)~^d) J^B^, 

Rs = C^{DJ^D^)~^C, 



where 



Jx = Jpvil) e E(p+”)^(p+“) , A = A- BwM~i\L 6 
5 =[s„y-i 0 „xv ] 

- I r/ 
c = 



' Ce' 




■ v,^m-^al 2 - MJ12M-AZ) ■ 


A. 




Cy — DywM^iiL 



^ Introduced in Appendix D. 

® See Appendix C for the definition of the signature matrix Jx 
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and 



Cy G 



D = 



Ce G E*’"'" 

I, 

DywV^ Ojnxv 






11 

12 



= D'^J^C + B^XA G e(*’+*')^”, 



Li G EP""”, 
L 2 G E"''", 



L = Li- M^i2 M~^\L2 G EP^". 

Moreover, V^i G E*'^^ and Vx 2 £ E^^p are the Cholesky factors of Mx 22 € 
E"^’' and — 7“^Mxii € Ep^p, respectively: 



Vj, F,1 = Mx 22 , VJ^Vx 2 = - 7 -^Mxll , 



(iv) Mxii < 0, where M^n = Mxii - Mxi 2 M^ 22 ^Ii 2 ^ 

Schur complement of Mxii ^ being a submatrix of the following block 

matrix: 



Mxll Mx12 

MJi2 Mx22 



= DJxD'^ + CXC'^ G i(<^+™)x(»'+’^) . 



Theorem 7.2. (A solution formula for Problem 7.1) If the conditions given 
in Theorem 7.1 are satisfied, then the following two propositions hold: 

(i) The sought filter K G is represented by the general for- 
mula K — LF{Ti,E), where the function S G with a bound- 

ed norm ||S||oo < 7 acts as a design parameter, while the function S G 
is given by 





A + BuCy, — L2M^22^y 


Mx12M^22 + -^2Af^22 


S(^) = 


1 


-V-,^Mxi2Mr^\ 

^xl Omxv 



Bs = BuV-,^V^2 + - L2Mr^2Ml2)Vx-2^ e E”>^^ 

= -V-^\Ce - Mxi2Mr^\Cy) G E"><", 

where 



[Li Z2] = LlGE"^^ , 

with Vxi G and Vx 2 ^ being the Cholesky factors of Mx 22 ^ 

]^mxm -j~‘^Mxii G respectively: 

V^Vxl = Ms22, ^72^x2 = -7”'Mx11. 
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(a) Zeroing S = O-^xm l^oids to the simplest and most commonly used 
form of the filter, called the central filter: 



K{z) = 



A -1- BuCy. — L2M^22^y 


BuVxl ^xl2^x22 "1" ^2^x22 


Gy 





^ \ Xf{k + l) = Axf{k) + Buu{k) + L2M^^2{y(k)-CyXf{k)), 

\ u{k) = -V-,^CeXf{k) - - CyXf{k)). 

The advised solution can be schematically depicted as shown in Fig.7.3. 



w 

u 




e 

y 



Fig. 7.3. Parameterised solution to Problem 7.1, 
the filtering task for the basic model G 



Analysing the above conditions (cl)-(c6), stated for Problem 7.1 formu- 
lated with respect to the basic model G, leads to the following conclusions, 
respectively: 

(cl) By virtue of (7.2) we see that the fulfilment of this condition requires 
a stable matrix A, and so it means that also P G 

(c2) Satisfying (cl) implies that the condition (c2) is valid for V Cp. 

(c3) From (7.2) it follows that this condition (c3) is always satisfied. 

(c4) This condition is equivalent to DpDp -j- FpFp > 0, which means that 
Dyyj should have its full row rank: rankD^^ m < p, 

(c5) Taking into account (7.2) we conclude that the condition (c5) implies 
that P{z) cannot have poles on o{z), thus P G which is a 

weaker requirement as compared to (cl). 

(c6) By (7.2) the condition (c6) forces that P{z) cannot have zeros on o{z). 
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As a continuation of our discussion, certain operative aspects of the anal- 
ysed problem (7.4) of iJoo-optimal FDI filtration founded on the basic model 
G of the monitored dynamic plant are listed below. 

Characteristics 7.1. 

1. Taking into account (7.2), we have Px = Ap and Qx = Onxn- 
this case, the zero solution X = Onxn to the corresponding DARE satisfies 
the condition (i) of Theorem 7.1, which can he checked immediately from 
\{P^ - R^iln + XR^r^XP^) = A(^p) C v{z). 

2. For X — Onxn we obtain 



M* = D'^J^D = 



T)T T-) _ 2 r 

■^ew^ew I 

D oon 



ew 

Iv 



along with Mxu = —l‘^Ip < 0. thus follows that with X = Onxn the 
condition (ii) of Theorem 7.1 is also fulfilled. 

3. Moreover, zeroing X = Onxn implies that L\ = Opxn ond L 2 = 
Ovxn, which leads to the conclusion that L = Opxn cls well. The Cholesky 
factors Vxi and Vx 2 take the simplest forms of unity matrices: Vxi = Iv 
and Vx 2 = Ip- Hence 

A — Ap, B — \By^ Onxv\-) 



1 

1 




Oyxn 




Hew ly 








, D = 




0 " 








Dyw OqjlXV 



On the above assumptions, we also deduce that the matrix 



DJsD'^ = 



-Fp 



-Fp 

DpDj + FpF^ 



^ |^(v+m) X (v+m) 



is invertible 






H 2 H2F^Hi 

HiFpH2 Hi + HiFpH2F^ Hi 



where 

Hi = {DpDl + FpF'^)-^ 6 

H 2 = ((1 - - FjHiFp)-^ € 

4 . Zeroing the parameter = Ovxm leads to the following simplified form 
of the central filter: 



K{z) 



Ap Kx Op H-x 

Mx12M^22^P ~~Mx12M^22 
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which can he implemented in the following innovative state-space form: 

{ Xf{k + l) - ApX f (k) + Kx {y(k) - CpXf (k)) , 
u{k) = Mxi 2 M^ 2 \CpXf{k) - Mji 2 M“ 222 /(fc), 

where 

Kx = {U - S„Msi 2)M-22 e 

As a comment to the above findings, below we provide four inferences of 
practical meaning. 

1. The discussed problem of FDI filtering established on the basic model 
G requires only one discrete-time algebraic Riccati equation DARE to be 
solved. Moreover, the solution can be obtained by employing the standard 
numerical procedures. 

2. The filter K G has a standard full-order observer-based struc- 

ture of the dimension n, which is equal to the dimension of the dynamic 
plant P. 

3. It is important that within the specific area of FDI methods the stability 
requirement for P is not critical. This claim will be illustrated in Section 7.3, 
where we shall demonstrate how to obtain P applicable to the problem of 
state observer synthesis. 

4. Seeking the optimal parameterisation of the filter K — LF(S,5) ap- 
pears to be very interesting. Nevertheless, this issue is beyond the scope of 
the present work. 



7.2.3. Optimal filtering based on the dual modelling 
of generalised plants 



As results from (7.3), the transfer function Geu{^) is invertible. We can thus 
uniquely transform the basic model G of the generalised plant into its dual 
model: 



e 




u 




-> 




w 




. y _ 



described by its dual chain- scattering transfer function matrix (Kimura, 
1995; 1997): 



Geuiz) -Gj{z)Gewiz) 

Gyu(z)G-^^ (z) GyUz) - Gyu{z)G-^ {z) 



G=^DCHAIN{G) = 
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As can easily be verified, 



DCHAIN{G) = 



ly Oyxp 

G yu G yy) 





G eu 


G ew 




Opxv 


ly 






eu 

G 



yu 



Gyxm 

Im. 



-1 r 









-1 



Geit; 

Gyw 



By virtue of the above, we acquire the following state-space representation 
of the dual model: 



G: 



x{k + 1) = Ax{k) -h Bee{k) + By;w{k), 

u{k) - Cux{k) 4- Duee{k) + Dyw^^ik), 
^ y{k) = Cyx{k) -f Dyee{k) + Dyww{k), 



where 



A = A- BpD-^Ce e 

Be = S„Z?-1 £ K"X^ 


Bui = Bp, - BpD^^Dew £ E"^P, 


Cu = -Dj^Ce € 


Cy^Cy- DyuD-^Ce € E“ ^ , 


Due = D-^ £ 


buw = -D-^Deu, e E’^xp, 


Dye = DyuD-^ £ 


byuj — Dyui — DyuDgu Dew € E”* ^ P 



Taking into account the detailed forms of particular matrices, we derive 
the dual model as 



G{z) = 



■ A 


B ' 


_c 


b 



Gue{z) G 

uw (^) 
Gyei^z'j Gyy}{^Z^ 



£ ^ ("+P) j 



( 7 . 5 ) 



A = Ap, B = [Be BJ^ = [Onxv £ 



jnx(t)+p) 



c = 



■ Cu ■ 




Oyxn ] 


1 

1 




1 



{v-\-m)xn 



D = 



Bye 


DyW 




-Iv 


Dew 


Dye 


Dyiy 




Ojnxv 


Dyw 






Gue{z) = -Iv e GuUz) = /.] e 

Gye(z) = Omxv e GyUz) = [Pdiz) Pf(z)] £ 



where 
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e 

W 

Fig. 7.4. Filtering for the dual model 




With the assumed form of the filter iiT G we can draft the following 

scheme of a dual filtering task as shown in Fig. 7.4. 

The relation u — Ky can be readily expressed in the equality form: 
[J-y — = 0^. Hence, by virtue of (7.5), we have 



[/. -K] 



Gue G uw 
Gye Gy'uj 



and 

{Gue KGyo^C {GiiW KG y'UD^W Oy ^ 

The above determines the required transfer-function map Few • w e, 
which can be shown in the form of the following dual homographic transfor- 
mation (Kimura, 1997): 

DHM{G,K) = -{Gue - KGye)-^ - KGyu,). 

In the sequel, the following two easy-to-check properties of this transfor- 
mation shall play a significant role: 

DHM{Gi,DHM{G2,K)) = DHM{G2Gi,K), 

DHM{DCHAIN{G),K) = LF{G,K), 

The first formula illustrates the fundamental principle of the cascade con- 
nection of systems described by their dual chain matrices, while the second 
formula shows the equivalence of the discussed suboptimal tasks of filtering 
in Hqq . 



Problem 7.2. FDI filter synthesis optimal with respect to Hoo'. 

find \\DHM(DCHAIN{G),K)\\ < 7 . (7.6) 



7.2.4. Solution using the dual model 

The key to an effective solution to the stated problem of iJoo -filtering with 
respect to the dual model G = DCHAIN{G) is its appropriate repre- 
sentation in terms of rational factors. It can be shown that if the func- 
tion G G has the so-called {Jyrn , Jvp) Aossless^ factorisation 

® See Appendix C for the definition. 
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G = n-iS, where fi G GH^^, and G is a dual (J^m, JvpY 

lossless matrix function, then a filter K G which ensures that 

\\DHMiG,K)\\^ < 1, 



can be described as 






where 0 of a unitary bounded norm (Kimura, 1995; 1997) acts as a design 
parameter (see also Appendices B and C). Hence, we arrive at 

Feu;=DHM{G,K) = DHM{n^,DHM{n-\e)) =DHM{^,e). 

The discussed solution is schematically portrayed in Fig. 7.5. 




Fig. 7.5. Parameterised solution to Problem 7.2, 
the filtering task for the dual model 



Now, a general theorem, concerning the solution to the task of optimal 
filtering w.r.t. ifoo 2ind set up for the dual model, will be considered (Kong- 
prawechnon and Kimura, 1998; Suchomski, 2001). Similarly to the previous- 
ly shown development, this theorem will lead to a corresponding generalised 
eigenproblem. 

Theorem 7.3. (The existence of a solution to Problem 7.2) Suppose that the 
basic model G G of a dynamic plant is given for which the 

standard conditions (cl)-(c6) hold. Then a suboptimal filter K G RH^^ 
satisfying \\LF{G,K)\\oo < 7 exists if and only if: 

Let j — 1 and G = DCHAIN{G) G denote the corre- 

sponding dual model. A dual {Jyrn^ Jvp) -lossless factorisation G = 17^ of 
this model exists iff: 

(i) {Uy^Wy) G dom(T)Ric) and Y = DRic{Uy,Wy) > 0; where the sub- 
matrices of the pair (Uy,Wy) are described as 

Py=:A^- C^{DJyi)^)-^DJyB^, 
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Qy = -BJy{jy ~ {D Jy D) Jy , 

Ry = -C'^{DJyD'^)-^C, 



where 



Jy — JvpiX) ^ 



{v+p)x{v+p) 



(a) (Uy,Wy) E dom(jDRic) and Y = DRic{Uy,Wy) > 0, where the sub- 
matrices of the pair (Uy, Wy) have the following form: 

Py — Qy — ^nxnj Py — B JyB , 



a{YY) < 1, 

(iv) there exists a non-singular^^ matrix My G x ^ such that 

Ey — MyJyMy , 



where 

Ey = DJyb^ - CYC^ G + + ^ 

Jy - Jvm{l) e E(^+"^)x(^+"^). 

Theorem 7.4. (The form of solutions to Problem 7.2) The factor fl G 
has the following structure: 



where 



Cl{z) 



■ A 


-{In-YY)-^B ■ 


_ C 


b+in 



My G 



i = i - Ry{I„ + YRy)-^YA e E"^", 

B = -{BJyt)'^ - AY C'^)Ey^ € J^nx(«+m)^ 

C = C- DJyB'^{ln + YRyy^YA e k(»'+™)x«, 
while My € lR(*'+'”)x(v+m) a nonsingular matrix solution to 

D{Jy + B^YBy^D'^ - C{ln - YYy^YC^ = MyJyM^ (7.8) 



10 



See Appendices A and C. 
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By referring Theorems 7.3 and 7.4 to the stated problem (7.6) of opti- 
mal FDI filtering based on the dual model G, we can derive the following 
characteristics of the obtained solutions: 

Characteristics 7.2. 

1. It is clear that the application of Theorems 7.3 and 7.4 to solving Prob- 
lem 7.2 requires, in general, performing the following scaling (normalisation) 
of the dual model G: 





■ A 


B^ 




A 


jBe 


Bw 




G,{z) = 




Cu 

. Cy 


iDue 

jBye 


D UW 

Dyw 


^ j^{v-\-m)x{vA-p) ^ 
z 


C 


D,_ 


Z 



with Bj — B, V 7 . 

2. Since Py = A = Ap, and \{Ap) C v{z), it follows that the zero matrix 
Y — Onxn constitutes a solution to the corresponding DARE (condition (ii) 
of Theorem 7.3). 

3. With Y — Onxn, the condition (Hi) of Theorem 7.3 is always satisfied. 
4- The matrix 



b^Jybf^ = 



p 



Fp D,D^ + F,F^\ 



G '^'^~^Fn)x{v-{-m) 



is invertible: 



{D,Jyi)^) 



-H 2 H 2 F^Hi 

HiFpH2 -(Hi + HiFpH2F^Hi) 



5. As G— G, we conclude that the equations (7.7) and (7.8) acquire a 
common form, which can be expressed as 

MyEyMy == Jy, 

where My = M~^ = G x stands for the required nonsin- 

gular solution, while 



Ey — 



Eyll Eyl2 

Ey22 

(1-7^)^ 



L 



= - CyO^ 



^ p 

DpDl + FpFj + CpY cl 



(7.10) 
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Assuming the following block-triangular form of the matrix My: 

Myll My22 



My = 



Myll Myl2 
^mxv My22 

yields the resultant solution of (7.9): 

Myll = ^yl2 = MyllEl2My22My22, 



My22 — Vn 



-T 
y2 5 



where Vyi and Vy2 G denote the Cholesky factors of Eyu G 

and —Ey22 G respectively: 



Vyl^yl — Eylh 



VylVy2 = -E.. 



' 2/22 5 



while Eyii = Eyii—Eyi2E~22Ey^2 Schur complement of Eyu G R^^^, 

being a submatrix of Eyn. 

Moreover, it follows from (7.10) that Ey22 < 0. This, in turn, implies 
that Vy2 always exists. Consequently, a sufficient condition for the existence 
of My of the assumed block-triangular form is equivalent to the requirement 
of Eyii > 0. 

6. The presumed form (7.9) of the design equation turns out to be very con- 
venient with respect to the task of deriving the state representation of the in- 
version which is required for the determination of K = DHM{ft~^,Q). 
This is due to the fact that, by taking into account the identity A = Ap, we 
acquire 

Q,{z) = 



Ap 


-BM-^ 1 


c 


J 



where B =: -{B^Jybf^ - AYC^)Ey^ = -{BJyD^ - AYC^)Ey^. Conse- 
quently, we derive 



' Ap + BC 


B 




fill ( 2 ) 


fll2{z) 


MyC 


My _ 


Z 


_ ^2l{z) 


i^22{z) _ 






The required stable filter K G parameterised with 0 G BH^ 

takes thus the form 



xm 

oo 



= -(All - 0 ^ 21 ) (A 12 - 0 A 22 ). 

7. Clearly, by zeroing 0 = obtain the following simplest central 

solution: 



K = -n7}n 



11 



where 



Vtii{z) = 



+ byCp 
Myl2Cp 









yli 



-I 2 
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Ap -h ^yCp 


1 

<cq| 


Myl2Cp 


1 

(M 



with 



Thus, by taking into account the fact that 



011^(2:) = 



ByE 



Ap + ByCp — B^My^^Myi2Cp 


BuM-,\ ■ 


1 

to 


J 



we gain the following non-minimal representation of the requested filter: 





Ap My 12 By)Cp 


Ru^yU^yl^Cp 




K{z) = 


On X n 


Ap 4- By Op 


4 




^yll^yT^^Cp 


~^yll^yl2Cp 





As can be easily seen, by employing a similarity transformation described with 



the matrix 



In In 
Onxn In 



, the above non-minimal representation can be trans- 



formed into the following one: 





Ap KyCp 


^nxn 


1 


K{z)^ 


Onxn Ap 


+ ByCp 


By 




^yll^yl^Cp 


Opxn 





where 

Ky = . 

By discarding a non- observable part of this filter representation, we derive 
the final simplified model of the required filter along with its corresponding 
innovation form of easily implementable state-space equations: 



K{z) = 






K, 



^yll^yl2Cp My-^-^Myi2 



^ ( Xf{k + 1) = ApXf{k) + Ky{y{k) - CpXf{k)), 

\ 'u(Aj) = My^^Myi2CpX f{k) — My-^^Myi2y{k) . 

For a practical interpretation of the above results, we shall also provide 
the following two inferences: 

1. Similarly to the previous development, we observe that the analysed 
FDI problem also requires only one discrete-time algebraic Riccati equation 
to be solved. 
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2. Using the dual model G facilitates the parameterisation of the set of 
filter solutions K — as compared to the set of filters K = 

iyF(S,S) obtained for the basic model G. The dual lossless factorisation 
G — 0^ of the dual model G can also be used in an extended filtration 
algorithm when the plant G has zeros on the unit circle o{z). 



7.2.5. FDI filtering with the instrumental reference signal 

Let us consider yet another and more complex scheme of the FDI system, 
in which an instrumental reference signal ys G Mf is formed via appropriate 
filtering of a pattern fault signal fp (Suchomski and Kowalczuk, 2000; 2001). 
A scheme of such a system is illustrated in Fig. 7.6, where it is assumed that 
the reference ys is generated by a block described in the state space: 

^ f Xs{k + 1) = AsXf{k) + Bsfpik), 

\ Vs{k) = CsXf(k) +Dsfp{k), 

with Xs G acting as a filter state vector. The matrices of this model are 
appropriately dimensioned as Ag G Bg G Cg G and 

Dg G A diagonal matrix Ag with rig = s = v can, for instance, be 

interpreted as a model for forming the reference yg via autonomous channels 
of one-pole filters acting on the elements of the pattern fault fp. A question 
can be posed: what is the motivation for this scheme? The source of this 
type of filtration can be found in an attempt to incorporate certain prior 
knowledge on the monitored faults into FDI algorithm design (Edelmayer 
and Bokor, 1998; Chen and Patton, 1999a; 2000). 

As can seen from Fig. 7.6, the FDI filter K : yp r, being the subject of 
our optimisation procedure, generates a residue r G , which is compared 
with the instrumental reference yg. Thus the error of the fault estimate takes 
the form 

e = yg -r eMf . 




Fig. 7.6. Extended FDI-filtration scheme with the instrumental reference signal 
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It is easy to check that the generalised plant G with a properly extended 
state vector, 



X — 



can be characterised with the aid of the following matrices: 






OfiXUs 

OriaXn 






B = [B^ Bu] e ^ ^ , B^ = 



Bp Ep 

Oyig xq B§ 






Bu = 



0„ 

On 



C = 



D = 



Ce 

Cn 






Dew D eu 



Ce = [Osxn Cs]eWX(^+^0^ 

Cy = [Cp O^xnJ € 

Dew = [Osxq Ds]eW^P, 



G -^ew l^sxq 

Dyw — [Dp Fp\ G 



ixp 



D eu — ^ 



Dyu — Gf] 



The corresponding operator model G G takes the form 

of (7.3) with 

Gew{z) = [Osxq S{Z)] G , Geu{z) = ~Is G 



where 



Gy^(z) = [Pd{z) Pf{z)] G Gyu(z) = Omxs e 



S{z) - Csizina - As)-^Bs + Dse 



The transfer function Few'- w e of the resulting closed-loop system 
(Fig. 7.6 and 7.2) has an affine form with respect to the filter K G 

Few = LF{G,K) = [Osxq S] - K[Pd Pf] G R^^^. 

On the other hand, a simple computation shows that the correspond- 
ing dual model G G Rf<^+^)x{s+p) q£ generalised plant with a properly 
enhanced state x G has the form given by (7.5), where 

A = A E ^i'^+'^Axin+Us) 



B ^ [Be Bw] == 



O, 



{n+ns)xs 



G £(^+^Ox(«+P) 
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c = 



Cu 

Cy 



D = 



Due 

Dye 



Duw 

Dyuj 



Cu 



D ZU) 

Dfnxs D 






yw 



Gue{z) = -Is e Guw{z) - Gew{z) G 

Gye{z) = Omxs e Gyu,{z) = Gyu,{z) G 



Let us consider some constraints that directly follow from the previously 

developed conditions (cl)-(c6), respectively: 

(cl) since the matrices A and Ag have to be stable, it is required that 
P G and S G RH^o^\ 

(c2) with the condition (cl) being satisfied, we have that the condition (c2) 
holds for V Cp, 

(c3) this condition is always valid, 

(c4) the matrix Dy^ should be of a full row rank: rankDyit; = m, m < p, 

(c5) the poles of P{z) and S{z) cannot be placed on o(z), i.e., it is claimed 
that P G RL'^P and S G RL^^, which is somehow weaker as com- 
pared to (cl), 

(c6) P{z) should not have zeros on o{z), whereas, in general, there is no 
such prerequisite for S{z). 



In the discussed case of FDI filtering with the reference signal ps, the 
suboptimal filter K G RH^^ of the rank (n + Ug) can be easily derived via 
employing the previously developed Theorem 7.1 or 7.2. It is worth noticing 
also that now only one Riccati equation is to be solved. However, this remark 
is obvious only for the method associated with the dual model G: due to 
the fact that Py = A is stable and Qy = 0(^_i_^^)x(n+n,)5 it follows that 
? = 0(n+ns)x(n+ns) Satisfies the corresponding (second) Riccati equation. 
On the other hand, if one employs the method based on the basic model G, 
a simple additional argumentation is necessary to justify the fact that the 
zero solution X = 0(^ri-\-ns)x{n-\-ns) solves the right (first) Riccati equation. 

With this in mind, it is easily verified that the following equation is 
satisfied: 





1 

1 

to 


OqXV 


Gqxs 


-1 


Gqxn 


HqXUa 


Gyxq 


DjD, - 7 V, 


-DJ 




Gyxn 


DjCs 




Gsxq 


-Ds 


Is 




Ggxn 


-Cs 



Gqxn Gqxus 
Gyxn GyXUs 
G sxn Gg 




7. Robust fl"oo-optimal synthesis of FDI systems 



283 



from which it follows that 

B(^D JxD) D JxC = 

C^J^D{D^J^Dy^D^J^C = 

On account of the above we conclude that Px = A and Qx = 
^(n+n^) X (n+ns) • Hence, in fact, X — t)e regarded as 

the required solution. From this we also conclude that the other approach, 
founded on the dual model G, in which we directly declare the zero value 
of Qy^ is better conditioned^^ numerically than the approach based on the 
basic model G. 



On 



On 



''nxn ^nxria 

Onaxn CjCs 



7.3. Synthesis of primary and secondary residual vectors 

With reference to the previously utilised model (7.1), let us consider now the 
following discrete-time model extended by the presence of a control signal 

Up e : 

{ Xp(^k H” 1) — ApXpi^k^ H“ BpiiUpi^k^ “|~ Bpdp(^k^ -h Ep fp(^k^ ^ 

(7.11) 

2/p(^) — GpXp{k^ “h DpiiUpi^k^ “1- Dpdpi^k^ H- Fpfp{k^^ 

where Bpu G E^^^“ and Dpu G . Moreover, assume that (Ap,Cp) is 

completely observable. 

A full-rank observer (Liu and Patton, 1998; Kowalczuk and Suchomski, 
1998) is described by the following equation: 

Xp(^k “hi) — ApXp{k^ “h Bpy^Up(^k^ KQ(jjp(^k^ yp{kf^ ^ 

where Kq G E^^"^ is the observer gain, and 

yp{k) = CpXp{k) -{- DpuUp{k) G E"^ 

denotes an estimate of the output of the plant P. This system yields an 
estimate Xp G E’^ of the observed plant state. By introducing the state- 
transition matrix of this observer as 

Ao = Ap-KoCpeMT^^, 

we acquire a convenient model of the full-order observer: 

Xp{k -}- 1) — AoXp{k^ -\- (^Bpii KoDpii^Upi^k^ -h K Qypijz) . (7.12) 



11 



We also say that this solution is numerically more robust, or has better numerical 
robustness. 
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The evolution of the state-estimation error defined as 

Xe — Xp — Xp ^ 

is described by the following equation: 

Xe{k + 1) = AoXe{k) -h {Bp - KoDp)dp{k) -h {Ep - KoFp)fp{k). (7.13) 

Clearly, we should assume that the matrix Aq is stable. 

On the other hand, the output-estimation error called the original residual 
vector, 

Ve — Up ^ E , 

can be expressed as 

2/e(^) — CpXe{k^ “f* Dpdp{k^ ~h Fpfp{k^. 

Let VL G E^^"^ denote the weighting matrix which is the subject of de- 
sign and which allows defining (and processing) the ensuing primary residual 
vector. 

(7.14) 

It is clear that the vector yyj can be interpreted as a certain observation of 
the state estimation error x^ in the presence of additive disturbances dp 
and fp. 

With the zero initial conditions Xe{0) = On, the solution of the equa- 
tion (7.13) takes the following operator form: 



x^{z) - To{Bp - KoDp)dp To{Ep - KoFp)fp, 

To = {zin - Ao)-^ e RH^^. 

On the basis of the above, the resulting ‘internal’ operator representation of 
the primary residual vector manifests itself as 

y_^ ~ Fwddp + F wdfpj 

with the following stable matrix transfer functions: 

Fnjd = WCpToiBp - KoDp) + WDp e 

F^f = WCpn{Ep - KoFp) + WFp G 

With reference to the primary residual vector, a classical problem of {dp- ) 
disturbance decoupling can be declared as the problem of the synthesis of a 
weighting matrix W, for which F^d is minimised and F^f is maximised (for 
the purpose of suitable fault detection). Thus a rational trade-off between 
disturbance attenuation and detectability of this FDI filter is a principal 
question (Patton and Chen, 1991; Chen and Patton, 1999a; Gertler, 1998; 
Kowalczuk and Suchomski, 1999). To obtain ideal disturbance decoupling. 
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we should demand Fu,d = O^xq with a non-zero F^f. With a more realistic 
approach to the synthesis of the weighting matrix W, based on a geometric 
perspective of the problem of disturbance decoupling, we should assume only 
partial (suboptimal) decoupling of the objective residual vector yyj (Liu and 
Patton, 1998; Chen and Patton, 1999a; Kowalczuk and Suchomski, 1999). 

In general, however, seeking effective solutions to the partial decoupling 
problem requires extra knowledge about the structure of the disturbances dp . 
Taking, for example. 




(7.15a) 



Bp — [Bpd OfiXUn ], BpdER^^^^, (7.15b) 

Dp - [Omxn, Dpnl Dpn G , (7.15c) 



where dpd G E^^^ is the plant disturbance, dpn G E^” denotes the mea- 
surement noise, and rid Tin = Q, we obtain the following matrix transfer 
function: 

Fu^diz) = [WCpToBpd W{Im - CpToKo)Dpn ] . (7.16) 

with two autonomous channels processing the plant disturbances and mea- 
surement noises, respectively. 

A particular solution for the partial protection of the residual vec- 
tor against the influence of plant disturbances dpd modelled in such 
a way, which is based on the suboptimal shaping of the transfer matrix 
WCpToBpd G was presented in (Suchomski and Kowalczuk, 1999; 

Kowalczuk and Suchomski, 1999) and can be found in Chapter 6. The cor- 
responding numerical algorithm employs the following principles concern- 
ing the method of shaping the eignenvectors of the observer state-transition 
matrix Aq'. 

a) some eigenvectors from attainable subspaces are to be chosen so as to 
secure a minimal distance to a perfect decoupling subspace (an FDI aspect), 

b) the remaining eigenvectors ought to be chosen so as to assure a min- 
imal distance to the subspace of possibly excellent robustness properties (a 
numerical aspect). 

As a consequence of the above procedure, we derive the matrices Aq and 
ATo, describing the state observer (7.12), as well as the weighting matrix W, 
which generates the primary residual vector (7.14). By interpreting this vector 
as an effective observation of the fault /p, we can easily find the resultant 
internal model of a primary residual generator, expressed in the space of 
states — ^6* 



x^{k + 1) = AoXu}{k) + {Bp - KoDp)dp(k) -h {Ep - KoFp)fp(k), 

y^{k) - WCpX^{k) + WDpdp{k) + WFpfp(k). 
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It is obvious that the above model has the structure (7.1), which has been 
assumed for the monitored plant P in the previous section. Thus we conclude 
that the methods of the optimisation of a secondary residue generator K 
developed there can be directly employed to construct the relation K: yw ^ 
r, the purpose of which is to create a secondary residual vector r of improved 
detectability properties. 



7.4. Numerical example 



Let US consider an unstable control channel of a continuous-time dynam- 
ic plant described by the following matrix parameters of its state-space 
equation: 



0 


1 


0 




0 


0 


0 


1 


, and Bt — 


0 


2 


1 


-2 




10 



Furthermore, let us discretise this system as equipped with the standard Zero- 
Order Hold (ZOH) at the sampling period To == 0.05 s. We shall also make 
use of the ‘structure’ of the disturbances dp characterised by (7.15)-(7.16) 
with Un — 2 and = 1. All these assumptions lead to the discrete-time 
model P of (7.11) with the following matrix parameters: 



1.0000 


0.0500 


0.0012 




0.0002 


0.0024 


1.0013 


0.0476 


II 


0.0121 


0.0952 


0.0500 


0.9060 




0.4760 



Bp = 



1 

1.5 

0 



0 0 
0 0 
0 0 



Ep — 



0.0 

0.0 

0.5 



Cp = 



1 0 1 
0 1 1 





’ 0 ' 




0 ' 1 0 




’ 1 ' 


Bpu — 




, Dp = 


1 


II 






0 




0 ' 0 1 




0 








1 







The above plant is subject to the stabilising state feedback Up{k) — 
—KcXp{k) with the gain Kc = [3.2410 2.9227 0.6972], which im- 

plements the following spectrum^^ of the closed-loop system: XiAni) — 
{0.8465,0.8465,0.8465}. 



The set of the poles of the closed-loop system, or the set of the eigenvalues of the 
system’s state matrix 




7. Robust iJoo -optimal synthesis of FDI systems 



287 



The observer characterised by the set of observation eigenvalues A(Ao) = 
{0.2,0.2,0.75} as well as the decoupling filter represented by the weighting 
matrix W has been designed with the application of methodology presented 
in Chapter 6. Note that we are coping here with a partial problem of shaping 
the observer eigenstructure, because the spectrum of the observer state ma- 
trix is not the goal of our treatment. A simple example of coping with this 
issue can be found, for instance, in (Suchomski and Kowalczuk, 1999). 

In this place, having adopted the notation introduced in Chapter 6, we 
omit all technical details concerning the evaluation of the corresponding mea- 
sures of both the disturbance decoupling and orthogonality of the attainable 
left eigenvestors of Aq. 

The results of disturbance-decoupling optimisation obtained by the pre- 
sented method are as follows: 

• the weighting row matrix for the primary residual signal {w = 1): 

W = [-0.7978 0.5319], 

• the parameters of the attainable left eigenvectors of the observer state 
matrix Aq’. 



0.6193 ' 0.1433 -0.1340 

I 

-0.4258 ' 0.4757 -0.0076 

I 

• the resultant attainable left eigenvectors of Ag: 





-0.7386 ' 

1 


-0.0775 


0.1700 


Ln — 


0.5971 ' 

1 


-0.5364 


-0.1959 




-0.3130 ' 

1 


-0.8404 


0.9658 



• the effective observer gain: 



Ko = 



0.9456 -0.1176 
0.1321 0.6519 

-0.0010 0.1608 



Having the base in the consecutive if -filtration of the primary residual 
signal yielded by the corresponding generalised dynamic plant in the form 
of the basic model G, the discussed problem (7.4) of the J?oo -optimisation 
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of the secondary residual vector has been suitably solved. In particular, by 
taking 7 == 1.1 we have obtained the following filter K: 





0.3442 


-0.0255 


-0.7302 


0.3631 


K{z) = 


-0.3928 


0.5248 


-0.8241 


-0.3298 


0.0932 


-0.1088 


0.7452 


-0.0037 




_ -0.2949 


0.1966 


-0.0983 


-0.3697 



•- I z 

This solution has been verified by means of simulated experiments, in 
which a fault has been represented by the following deterministic signal: 





0 


for 


i < 50 s, 


fp = < 


1 


for 


50s < i < 100s, 




0 

< 


for 


t > 100 s. 


Case A 


(zl) The plant disturbance 


dpd 


G M, acting on the state vector, is a 



stochastic process, obtained from a prototype zero-mean white-noise Gaus- 
sian process with the standard deviation 0.75 passed through a first-order 
shaping filter with its time constant equal to 2 s. 

(z2) The measurement noise dpn G is modelled as a stochastic vector 
process, achieved from prototype zero-mean white-noise Gaussian processes 
of dispersion 0.25 by employing the shaping filter of time constant 1 s. 

The illustrative results of the performed simulations are shown in Fig. 7.7, 
which concerns the detection of the fault fp in the presence of the distur- 
bances dpd and dpn- The first (a) and the second (b) co-ordinate of the 
original vector error of the output estimate Ve — Vp — Vp ^ as well as the 
primary (c) residual vector = Wye M. and the secondary (d) residual 
vector r = Ky^j ^ are shown there. 

Case B 

(z3) The scalar plant disturbance dpd G M is built as a sum of a deter- 
ministic square wave of magnitude 0.1 shown in Fig. 7.8 and a stochastic 
process resulting from a prototype zero-mean white-noise Gaussian process 
of dispersion 0.5 processed by the shaping filter with the time constant 2 s. 

(z4) The measurement noise dpn G is modelled as a stochastic vec- 
tor process, shaped by filtering prototype zero-mean white-noise Gaussian 
processes of dispersion 0.2 via the one-pole system with the one-second time- 
constant. 

The results of the simulation experiments performed for the purpose of 
the evaluation of the detection abilities of the examined residue generator 
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Fig. 7.7. Experimental course of residual signals: vector coordinates of the original 
residue (a) and (b); primary (c) and secondary (d) residual vector 




Fig. 7.8. Deterministic component of the plant disturbance dpd 



with respect to the fault fp in the presence of the disturbances dpd and dpn 
are depicted in Fig. 7.9. We can see there the first (a) and the second (b) 
coordinate of the original vector error of the output estimate — yp — yp ^ 
as well as the primary (c) residual vector y^^ = Wye ^ ^ and the 
secondary (d) residual vector r — Ky^ 
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The performed computations confirm the claim that the secondary JToo- 
optimal filtering of primary residual vectors is an effective method of improv- 
ing the properties of the designed FDI systems exposed to various disturbing 
signals. Due to the filtering, the symptoms of the faults appearing in the 
residual signals are made more apparent. They can thus be easily analysed 
and/or identified by a detection mechanism. 




(a) 



. Jw 










Ms] 



- 0.5 1 I ^ ^ ^ I 

0 50 100 150 200 




(b) 




(c) 



(d) 



Fig. 7.9. Experimental course of residual signals generated in the pres- 
ence of a deterministic square- wave disturbance: coordinates of the orig- 
inal residue (a) and (b); primary (c) and secondary (d) residual vector 



7.5. Summary 

In this chapter the problem of the mini-max optimisation of system input- 
output relations based on a worst-case spectral approach has been discussed. 
The issue of modelling uncertainty has not been considered. Such a simpli- 
fied routine has been applied purposely. Namely, by reducing the scope of 
our deliberations, we could obtain useful algorithms that confine FDI sys- 
tem design to solving one discrete-time algebraic Riccati equation, which is 
presently treated as a standard numerical task. 
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Consequently, we have limited our study to computationally simple convex 
optimisation problems with respect to the Hoo-norm. Thus we have avoided, 
for instance, certain difficult design issues connected with the synthesis of 
detection filters of a reduced order that would require the use of methods 
established, for example, on the so-called linear matrix inequalities (Boyd et 
al, 1994). 

In searching for an optimal design method of residual- vector generators, 
we have focused out attention on the problem of disturbance decoupling and 
two concepts of plant modelling. In the first approach we have considered 
the basic model of the monitored dynamic object, while in the second one we 
have employed a dual model of the plant. As has been mentioned above, in 
both methods the optimisation procedures have been performed on the basis 
of the DARE. 

The attached examples of the construction of detective residue generators 
have illustrated well the effectiveness of the proposed secondary 
filtering of the primary residual vector in the process of the computation of 
sharply outlined symptoms of faults, modelled as signals in plants, submitted 
to different disturbances. 

Nevertheless, we have to keep in mind the fact that principally a true 
power of the i^cx)-norm approach comes into view when - apart form uncer- 
tainty concerning process signals - one admits also structural and parametric 
deviations in the object model, i^oo-niethods allow, in such circumstances, 
obtaining robust minimax-optimal solutions of great practical importance. 

Appendices 

Certain basic definitions and properties concerning discrete-time models of 
dynamic processes monitored by the designed FDI system are listed in four 
consecutive appendices given below, where fundamental facts about discrete- 
time algebraic Riccati equations are also supplied. These data both comple- 
ment and aid the reading of this chapter. 

A. Discrete-time models 

Let A and the corresponding set of all eigenvalues of A be denoted 

by A(A). Within discrete-time models, the matrix A is said to be stable if 
A (A) C v{z), where v{z) — {z £ \z\ < 1} represents the open unit circle 

of the complex plane (Brogan, 1991; Jury, 1964). The unit circle being the 
boundary of v{z) shall be denoted by o{z) = {z e C: \z\ = 1}. 

A finite-dimensional, linear, causal and time-invariant^^ dynamic system 
(object, plant) characterised by its operator model, a transfer function G{z), 
is BIBO-stable if all the poles of G(z) belong to v{z). 



With respect to the discrete time k. 
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Consider a system G described by the following equations in the state 
space with the use of properly dimensioned matrices A, C and D: 



^ I x{k + l) ^ Ax{k) + Bu{k), 
I y{k) = Cx{k) -h Du{k). 



This system, representing the relation between the input signal u{k) and 
the output signal y{k) by employing a properly defined state vector x{k), is 
asymptotically stable if and only if (iff) its state transition matrix A is stable. 
The corresponding transfer function matrix G{z) = G{zl — A)~^B + D, 
where 7 is a unity matrix, can be written in the following state-space-related 
compact form: 



G{z) 



■ A 


B ' 


C 


D 



(7.A2) 



A hermitian conjugate of G{z) is defined by the transposition (T) of the 
complex conjugate image of G{z) as 



G^{z) = G(z*f, 



(7.A3) 



while a para-hermitian conjugate system results form the following transpo- 
sition: 

G-{z)=G{l/zf. (7.A4) 

With a nonsingular matrix A, the parahermitian conjugation can be ex- 
pressed as follows: 



A-^ 


A-^G^ 


-B^A-^ 


DT - b'^A-^C^ 



(7.A5) 



B. Norms and spaces 

Let g — {g{k)}^ denote a sequence of terms g{k) G V/c. Define the 
following normed functional space: 



l2=l2[0,oo) = {g(k): ||£?(fc )||2 < oo}, (7-Bl) 

where 

/ oo S 1/2 

\\9\\2= iY^9{k)'^9{k)\ . (7.B2) 

The dynamic system G: u Gu can be characterised by a matrix norm: 

||G||oo = sup (7.B3) 

0:^uEl2[0,Oo) ll^lb 



14 



That has non-zero eigenvalues: {0} ^ 
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which is induced by the quadratic norm || • H 2 . If G{z) is a transfer function 
matrix of a given finite- dimensional, linear and time-invariant system G, then 

||G||oo = sup a{G{e^%), (7.B4) 

0G(— vr,7r] 

where d(-) denotes a spectral norm^^ of a matrix argument. 

Let R = denote the space of real-rational p x r-matrix- valued 

functions of the complex variable z E C. The subspace of proper functions^ ^ 
that are analytic in o{z) is denoted by RLqq = RL^^. The subspace of 
RL^^ consisting of all (stable) functions that are analytic outside the closed 
disk v{z) = {z G C: \z\ < 1} is distinguished as RH^o = RH^^. 

The set BHoo of all unitary-bounded functions in RH^^ is identified 
by 

= ||F||oo<l}, (7.B5) 

while the group GHcx) of all (stable and minimum-phase^^) units of RH^^ 
is settled as 

GH^ = {FgRH^P: F-^eRH^P}. (7.B6) 



C. Factorisation 



Let Jmn{l) ^ £(^+’T')x("^+^) ^ for 7 G M, be a signature matrix defined as 



follows: 



Jmni'y) — ® ( T ^n) — 



Omxn 
Gfixm -7^In 



(7.C1) 



Additionally, we shall introduce Jmn — Jmn{^)’ 

The transfer function G{z) G ^ is said to be dual 

i^Jvrm «A;p)“Unitary if 



G{z)J^pG^{z) = J^rn, \/z. (7.C2) 

A dual (J^^, Jvp)-unitary transfer function G{z) G ^ is said 

to be dual {Jvm, Ji;p)-lossless if 

G{z)J^pG^{z)>J,m, 'iziv{z). (7.C3) 

If a function G € ^ G+p) expressed as 

G = 0$, (7.C4) 

where fi £ GH^”^ and ^ is a dual {J^m, Jvp)-^oss\ess 

function, we say that G has a dual {Jvm, J^;p) -lossless factorisation. 

Namely, the maximal singular value (see the footnotes 4-6 of Subsection 5.4.2 in 
Chapter 5). 

Of the space R. 

Having a stable inverse. 
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D. Discrete Riccati equation 

Let (U,W) G lR2nx2n £2nx2n denote a pair of real quadratic matrices: 



(U,W) = 



1 

1 




P Onxn 


. Onxn P^ . 


5 


1 

1 

1 



(7.D1) 



composed of real quadratic matrices P,Q — Q^, R = RF € . Moreover, 

let us assume that there exist three other real matrices Xi, X^, A G E"^” 
for which the following equation holds: 



W 



Xi 

X2 



= U 



Xi 

X2 



A. 



(7.D2) 



If the matrix Xi is nonsingular, we have 



P Onxn 

-Q In 



X 



Xi 



In R 

Onxn P^ 





' In ' 




X 



XiA, 



where X = X 2 X^ ^ 6 E"^”. It thus immediately follows that 
XiAXf 1 = (In + RX)-^P, 



(7.D3) 



(7.D4) 



1 


'in' 




In 


_Onxn P^ \ 


X 




_ P^X{In + RX)-^ _ 



(In + RX). (7.D5) 

Moreover, by virtue of (7.D4), from (7.D3) we derive the following equation: 

P^X{In+RX)-^P~X + Q = Onxn- (7.D6) 

After simple calculations we easily verify that {!„ + RX)~^P = P — 
R{In + XR)~^XP. On this account we readily conclude that X satisfies the 
following discrete-time algebraic Riccati equation: 

P'^XP -X- P^XR(In + XR)-^XP + Q = Onxn. (7.D7) 

If for a given pair (JJ, W) there exist X G E”^” and a stable A G E"^" 
(with A(A) C v{z)) such that^® 



W 



X 



= u 



X 



(7.D8) 



18 



That is, we assume that X\ — In in the above equations. 
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then we say that the pair {U,W) belongs to the domain of a discrete Riccati 
operator: 



DRic: {U,W)-^X: ]g^nx 2 n ^ J^ 2 nx 2 n ^nxn ^ 

which is denoted by {U, W) G dom (DRic). 

A necessary condition for (t/, W) G dom (DRic) takes the form of a 
conjunction of the following two requirements: 

(i) the generalised eigenvalues of a matrix pencil {zU — W) associated 
with the pair (U,W) are located outside the unit circle o{z), and 

(ii) there exists at least one stabilising solution to (7.D6), i.e., X G 

for which {In -i- RX)~^P is a stable matrix: \{{In -i- RX)~^P) C v{z). 

Such a solution can simply be written as X = DRic (D, W). 

The following lemma summarises some basic properties of DAREs in the 
general case considering Xi ^ In- 



Lemma 7.1. If {U,W) G dom (DRic) and X — DRic{U,W), then: 

(i) X — X2X^^ is unique and symmetric, X — X^ , 

(ii) In~\-XR is nonsingular, 

(Hi) X satisfies the DARE: 



P^XP -X- P^XR{In + XR)-^XP + Q 
=P^X{In + RXy^P -X + Q = Onxn, 



(7.D10) 



(iv) (/„ + i?X)-ip = P- R{In + XRy^XP is stable. 

The equations (7.D7) and (7. DIO) can be solved by employing an appro- 
priate generalised eigenvalue problem applied to the matrix pencil associated 
with a given pair {U,W) (Laub, 1991; Van Dooren, 1981). 
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Artificial Intelligence 




Chapter 8 



EVOLUTIONARY METHODS IN DESIGNING 
DIAGNOSTIC SYSTEMS^ 

Andrzej OBUCHOWICZ*, Jozef KORBICZ* 



8.1. Introduction 

One of the most important methodologies of designing Fault Detection and 
Isolation (FDI) systems is based on the model of a diagnosed system. In 
a general case, this concept can be realized using different types of analyt- 
ical, knowledge-based, neural or fuzzy logic-based models (K5ppen-Selinger 
and Frank, 1999). Models are constructed for systems working both in nomi- 
nal and faulty conditions. Conventional fault diagnosis systems use analytical 
models, e.g., Luenberg observers or Kalman filters (Chen and Patton, 1999). 
The system’s dynamical properties are described by a set of differential equa- 
tions or mapping functions with a suitable set of parameters. 

Unfortunately, the application of analytical models is usually limited to 
linear systems or systems in which the linearization process causes relatively 
small errors. In cases when there are no mathematical models or the com- 
plexity of the model, practically, makes its implementation impossible, it is 
not possible to apply analytical models, or the obtained results are not satis- 
factory. In these cases artificial intelligence techniques, based on a knowledge 
base or on input/output information, become an attractive tool for construct- 
ing desired models. The input /output mapping is represented by a rule base 
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(Amann and Frank, 1997), or by a set of parameters, which are identified 
using a set of training data. If only a set of pairs of the input and output sig- 
nals is known, then neural models (Korbicz et a/., 1998), fuzzy sets (Koscielny, 
2001), genetic algorithms (Chen and Patton, 1999; Witczak et al, 1999) or 
their combinations (Korbicz et al, 1999; Koppen-Selinger and Frank, 1999) 
can be applied. 

Although there are many techniques for designing non- analytical models, 
all of them, sooner or later, reduce to a certain set of optimization prob- 
lems, e.g., searching for an optimal structure of the model or calculating 
its parameters. These problems are usually multidimensional, multimodal, 
and, sometimes, multi-criteria. Classical local optimization methods are in- 
sufficient to solve them. Recently, direct search methods have been proposed 
(Goldberg, 1989) to solve optimization problems. Unlike advanced classical 
optimization methods, direct search methods do not require the differentia- 
bility of the objective function. They possess the ability of crossing saddles 
of the searched landscape. This property amplifies the possibility of finding 
the global optimum for multimodal objective functions. 

Evolutionary algorithms (EAs) seem to be especially attractive searching 
methods (Back et a/., 1997). They form a broad class of stochastic adaptation 
algorithms inspired by biological processes that allow populations of organ- 
isms to adapt to their surrounding environment (Back et a/., 1997; Goldberg, 
1989; Michalewicz, 1996). Evolutionary algorithms combine the Darwinian 
principle stating that new generations are created by better-fitted individuals 
through the elimination of wrong features and random information exchange 
using knowledge acquired by previous generations. This searching mechanism 
surprises with its effectiveness concerning adaptation and global optimization 
problems. 

In this chapter, the process of constructing a fault diagnosis system is 
considered as a problem composed of a set of global optimization tasks, and 
evolutionary algorithms are shown as an attractive tools for solving these 
tasks. 



8.2. Evolutionary algorithms 

The structure and properties of evolutionary algorithms are discussed in sev- 
eral excellent books (Back et a/., 1997; Goldberg, 1989; Michalewicz, 1996). 
Papers concerned with evolutionary computation are published in many sci- 
entific journals. There are at least 20 international conferences closely con- 
nected with evolutionary methods. Due to a large number of available publi- 
cations it is impossible to present all out of the plenty of different evolutionary 
algorithms and their components, used to improve algorithm efficiency in the 
case of a given problem to be solved. In this chapter, the main components 
of evolutionary algorithms are revisited and their different basic forms are 
briefly discussed. 
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8.2.1. Basic concepts of evolutionary search 

In nature, individuals in a population compete with each other for resources 
such as food, water and shelter. Also, members of the same species often com- 
pete to attract a mate. Those individuals which are most successful in sur- 
viving and attracting mates will have relatively larger numbers of offspring. 
Poorly performing individuals will produce very few, or even no offspring 
at all. This means that information (genes), slightly mutated, from highly 
adapted individuals will spread to an increasing number of individuals in 
each successive generation. In this way, species evolve to become more and 
more suited to their environment. 

In order to describe a general outline of an evolutionary algorithm let us 
introduce several useful concepts and notations (Fogel, 1999). An evolution- 
ary algorithm is based on the collective learning process within a population 
P(t) = {a.k E G \ k = 1,2, . . . ,7]} of rj individuals, each of which represents a 
genotype (an underlying genetic coding), a search point in the so-called geno- 
type space G’ The environment delivers quality information (fitness value) 
about the individual, dependent on its phenotype (the manner of response 
contained in the behavior, physiology and morphology of the organism). The 
fitness function $ : D — > M is defined on a phenotype space V. So, each 
individual can be viewed as a duality of its genotype and phenotype, and 
some decoding function, epigenesis, ^ ^ is needed. 

At the beginning, the population is arbitrarily initialized and evaluated 
(Tab. 8.1). Next, the randomized processes of reproduction, recombination, 
mutation and succession are iteratively repeated until a given termination 
criterion i : G^ {true, false} is satisfied. Reproduction, called also pre- 
selection, s^^ : — >■ , is a randomized process (deterministic in some 

algorithms) of 77 ' parent selection from rj individuals of the current popula- 
tion. This process is controlled by a set Op of parameters. The recombination 
mechanism (omitted in some realizations), rg^ : G^ G^ , controlled by 
additional parameters Or, allows the mixing of parental information while 
passing it to their descendants. Mutation, mg^ : G^ -> , introduces 

innovation into the current descendants, Om is again a set of control param- 
eters. Succession, called also postselection, s^^ : G^ x G^ -> 5^, is applied 
to choose a new generation of individuals from parents and descendants. 

The duality of the genotype and phenotype suggests two main approaches 
to simulated evolution (Fogel, 1999). In genotypic simulations, the attention 
is focused on genetic structures. The candidate solutions are described as 
being analogous to chromosomes and genes. The entire searching process 
is performed in the genotype space G- However, in order to calculate the 
individual’s fitness, its chromosome must be decoded to its phenotype. Two 
main streams of instances of such evolutionary algorithms can nowadays be 
identified: 

• Genetic Algorithms (GA) (Goldberg, 1989; Michalewicz, 1996), 

• Genetic Programming (GP) (Koza, 1992). 
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In phenotypic simulations, the attention is focused on the behavior of 
the candidate solutions in a population. All searching operations, selection, 
reproduction and mutation, are constructed in the phenotype space V. This 
type of simulations is characterized by a strong behavioral link between a 
parent and its offspring. Nowadays, there are three main streams of instances 
of phenotypic evolutionary algorithms: 

• Evolutionary Programming (EP) (Fogel, 1999), 

• Evolutionary Strategies (ES) (Back, 1995), 

• Evolutionary Search with Soft Selection (ESSS) (Galar, 1989). 



Table 8.1. General scheme of an evolutionary algorithm 

I. Initiation 

A. Random generation 

P(0) = {a,(0)|fc = l,2,...,y?} 

B. Evaluation 

P(0)^$(P(0)) = {g,(0) = #(|(a,(0))) |fc = l,2,...,r,} 
a t = i 

II. Repeat: 

A. Reproduction 

B. Recombination 

P"{t) = re^ {P'{t)) = {a','(t) | fc = 1, 2, . . . , 

C. Mutation 

P'"(t) = me^ {P'\t)) = {a'," (t) | fc = 1, 2, . . . , 

D. Evaluation 

P"'{t) ^ $(P"'W) = {qk{t) = $(e(a',"W)) 1 fc = 1,2, . . . ,7?"} 

E. Succession 

P{t + 1) = {P{t) U P'"{t)) = {Rk{t^l)\k = l,2,...,rj} 

F. t = t + l 

Until (f,(P(t)) = true) 



The choice of the type of the evolutionary algorithm depends on the type 
of the solved optimization problem. Genotypic EAs are natural tools for dis- 
crete optimization problems, because the genotype space is a discrete space 
and, usually, there are no problems with definiting of an isomorphism be- 
tween the genotype space and the discrete domain of the objective function. 
Phenotypic EAs are recommended for continuous parameter optimization. 
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8.2.2. Some evolutionary algorithms 

Three types of EAs are presented below: the GA, GP and ESSS, which have 
been applied to fault diagnosis systems described in this chapter. 

Genetic algorithms 

GAs are probably the most widely known evolutionary algorithms, receiving 
remarkable attention all over the world. The basic principles of GAs were first 
laid down rigorously by Holland (1975). The previously proposed forms of 
GAs, called Simple GAs (SGAs) (Goldberg, 1989), operate on binary strings 
of fixed length /, i.e., the genotype space Q is an /-dimensional Hamming 
cube Q — {0, 1}^ SGAs are a natural technique for solving discrete problems, 
especially in the case of the finite cardinality of possible solutions. Such a 
problem can be transformed into a pseudo-Boolean fitness function, where 
GAs can be used directly. Troubles occur in the case of continuous parameter 
optimization problems, which need the function (: V Q encoding the 
variables of a given problem into a bit string, the so-called chromosome. The 
encoding function ( is non-invertible and there is no inverse function 
A decoding function ^ : Q V C V generates only 2^ representatives of 
solutions. This is a strong limitation of SGAs. 

The parent selection is carried out using the so-called proportional 
method (roulette method): 

{hi,h 2 ,...,hrf] ■■ hk = min | h : > Xfc 1 , (8.1) 

[ 1^1=1 Qi } 

where {xk = U{0,1) \ k — l,2,...,r/} are uniformly distributed random 
numbers from the interval [0,1). In this type of selection, the probability 
that a given chromosome will be chosen as a parent is proportional to its 
fitness. Because sampling is carried out with returns, it can be expected that 
well-fitted individuals will insert several of their copies into the temporary 
population P'{t). 

Chromosomes from P'{t) are recombined. In the case of GA, the recombi- 
nation operation is crossover. Chromosomes from P'(t) are joined into pairs. 
The decision that a given pair will be recombined is made with a given proba- 
bility Or - If the decision is possitive, then an i-th position in the chromosome 
is randomly chosen and the information from the position (i + 1) to the end 
of chromosomes is exchanged in the pair: 

f {CL\ , (22 ? • • * 5 ^^ ) 1 ^ f ((2i , . . . , , 6^4-1 , . . . , 6/ ) 

\ (5i,62,...,&z) J \ (5i,...,5i,ai+i,...,a/) 

The newly obtained temporary population P"{t) is mutated. An individ- 
ual’s mutation me^ is done separately for each bit in a chromosome. The 
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bit value is changed to the opposite one with a given probability 6m- The 
obtained population is the population of a new generation. 



Genetic programming 

Many trends of the development of the SGA are connected with changing 
of an individual’s representation. One of them deserves particular attention: 
each individual is a tree (Koza, 1992). This small change in the GA gives 
evolutionary techniques a possibility to solve problems which have not yet 
been coped with. This type of the GA is called Genetic Programming (GP). 

Two set need to be defined before GP starts: the set of terms T and the 
set of operators F. In the initiation step, a population of trees is randomly 
chosen. For each tree, leaves are chosen from the set T and other nodes are 
chosen from the set F. Depending on the definitions of T and F, a tree 
can represent a polyadic function, a logical sentence or part of a programme 
code in a given programming language. Figure 8.1 presents a sample tree 
for T = {x,y,z,2,7r} and F = sin}. The new type of the individ- 

ual’s representation requires new definitions of the crossover and mutation 
operators, both of which are explained in Fig. 8.2. 




Fig. 8.1. Sample of a tree which represents the function yz -f sin(27rx) 



^ ^ ^ crossover ^ ^ 

A A i A / 



mutation 



random subtree 









Fig. 8.2. Genetic operators for GP 
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When the tree represents a function, the elements of the set T can be di- 
vided into variables and constants. The number of variables is determined by 
the dimension of the searching space. Constants are calculated by processing 
some base constants. Koza (1992) proposed only one elementary constant. In 
order to obtain a constant parameter of the represented function, the depth 
of the tree can turn very large. This fact poses problems with tree implemen- 
tation and interpretation, and the complexity of the genetic process rapidly 
increases. An alternative way is to assign the weight parameter to each node 
of the tree (Witczak et a/., 1999), where the weight is multiplied by the out- 
put of the corresponding node. Then the set T is reduced only to the set of 
variables. The genetic process operates only on tree structures, and thus on 
the space of function structures, too. The weights of the tree are obtained 
using identification methods. Although this idea is very simple, the tree can 
possess too many parameters so that the function cannot be identified. An ef- 
fective algorithm of weight allocation is proposed in the work (Witczak et al, 
2002). This algorithm is described in detail in Chapter 12. 

Evolutionary search with soft selection 

The algorithm of evolutionary search with soft selection is based on a simple 
selection-mutation model of phenotype evolution (Galar, 1989). An individ- 
ual is represented by a point in the real space W . The reproduction process, 
similarly to the SGA, consists in a random choice of rj parents P'{t) from r) 
individuals of the current population P{t) with return using the roulette 
method (8.1). There are no recombination operators in ESSS; only muta- 
tion is responsible for modification in the next population. Each element of 
P'{t) is mutated obligatory. The mutation consists in a random change of 
each co-ordinate by adding a random value of the normal distribution with 
expectation equal to zero and a given standard deviation a: 

P{t + 1) == + 1))^ 



= + N{0,a) I i = l,2,...,n; A: = 1,2, . . . ,r?|. (8.2) 

The mutated individuals create a new generation of the algorithm. 

Based on the ESSS algorithm, an algorithm successfully applied to an on- 
line process of training dynamic neural networks was developed (Obuchow- 
icz, 1999). ESSS with Forced Direction of Mutation (ESSS-FDM) accelerates 
saddle crossing and adaptation abilities in non-stationary environments using 
the adaptation of the expectation of the mutation operator. Chosen parent 
individuals are mutated with expectation m(^) 7^ 0, unlike in ESSS, where 
m{t) = 0. The direction of the vector m(t) is parallel to the last population 
drift: 

E[P(t)]-E[P(A-l)] 

^ ||E[P(i)]-E[P(t-l)]||’ 



(8.3) 
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where E[P{t)] — /i is a control parameter of the algorithm. 

A similar method is known in the area of neural networks as a momentum 
technique in the back-propagation training process (Korbicz et al, 1994). 



8.3. Optimization tasks in designing FDI systems 

There are two steps of signal processing in model-based FDI systems: symp- 
tom extraction (residual generation) and, based on these residuals, making 
decisions about the appearance of faults, their location and range (residual 
evaluation) (Fig. 8.3). The residuum is generated by comparing the measured 
output of the diagnosed system y{k) with the output of its model yM(^)- 
The resulting difference r(A:) == y{k) — ynik) (residual signal) should be 
near to zero for the nominal conditions of the system’s work and significantly 
remote from zero when a fault appears. In order to determine whether the 
fault appears and what kind it represents, the residual signal is analyzed in 
the residual evaluation module. 



u(A:> 
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Resign via the 
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y(k) 
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generation via 
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NN training via the EA 
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Fig. 8.3. Evolutionary algorithms in the process of designing an FDI system 

There are relatively scarce publications on EA applications to the de- 
sign of FDI systems. The proposed solutions (Chen and Patton, 1999; Chen 
et al^ 1996; Korbicz et a/., 1998; Obuchowicz, 1999; Witczak and Korbicz, 
2000; Witczak et a?., 1999; 2002) show high efficiency of diagnostic systems 
whose design has been aided by EAs. Attention should be paid to works on 
GP approaches to the modelling of dynamic non-linear systems: by choos- 
ing the gain matrix of the robust non-linear observer (Witczak et aZ., 1999), 
searching for the MIMO NARX model {Multi Input Multi Output Nonlin- 
ear AutoRegresive with eXogenous variable) (Witczak and Korbicz, 2000), 
or by designing the Extended Unknown Input Observer (EUIO) (Witczak 
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et al, 2002). Chen and co-workers (Chen and Patton, 1999; Chen et a/., 
1996) present the problem of designing the robust linear observer as a multi- 
criteria optimization problem. Genetic algorithms are an effective technique 
for solving this problem. 

Among artificial intelligence methods applied to fault diagnosis systems 
artificial neural networks, used to construct neural models as well as neu- 
ral classifiers, are very popular (Frank and K5ppen-Selinger, 1997; Koppen- 
Selinger and Frank, 1999; Korbicz et a/., 1998). Neural network approaches 
to the construction of FDI systems is the topic of other chapters of this 
book. But the construction of the neural model corresponds to two basic 
optimization problems: the optimization of a neural network’s architecture 
and its training process, i.e., searching for an optimal set of network free 
parameters. Evolutionary algorithms are very useful tools used to solve both 
problems, especially in the case of dynamic neural networks (Korbicz et aL, 
1998; Obuchowicz, 1999; 2000). 

Although one can pin hopes on EA applications to the design of the resid- 
ual evaluation module, possibilities of such applications rather than their 
practical realization are presented in this chapter. One of the most inter- 
esting GA applications can be found in initial signal processing (Kosihski 
et al, 1998), in fuzzy systems tuning (Carse et a/., 1996; K5ppen-Selinger 
and Frank, 1999), and in the rule base construction of an expert system 
(Frank and Koppen-Selinger, 1997; Koza, 1992; Skowrohski, 1998). 



8.4. Symptom extraction 

8.4.1. Choice of the gain matrix for the robust non-linear 
observer via genetic programming 

In this subsection a method of constructing a robust non-linear observer on 
the basis of the classical approach and genetic programming is presented 
(Witczak et al, 1999). Let us consider a non-linear discrete system described 
by the following system of equations: 

X;t +1 = f(x*,Ufc,Wfc), (8.4) 

y*; = h(xfc,Vi;), (8.5) 

where is the input signal, yk is the output signal, yik is the state, Wk 
and Vk represent the system and measurement noises, respectively, and h(-), 
f(-) are non-linear functions. 

The problem reduces to the state estimation xjt of the system (8.4)- 
(8.5), where the set of the input and output measurements and the structure 
of the model are known. In accordance with the classical approach, the state 
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estimation has the form 




Xfc = Xfc + Kke^ , 


(8.6) 




^k = y* -Mxfc), 


(8.7) 



where is an a priori output error, is an a priori state estimation, 
and Kk is the gain matrix. 

The matrix Kk of the observer (8.6)-(8.7) can be obtained using many 
methods (e.g., the Kalman filter, the Luenberger observer), which, in most 
cases, define as a matrix of constant values, and the obtained observer is 
not resistant to disturbances caused by the inaccuracy of the applied model. 
In this approach, the gain matrix is assumed to be composed of the functions 
of the a priori input error and the input signal of the system: 

+K{el,nk-i)el. (8.8) 

Now, the problem is to construct the set of functions forming the matrix 
K{e1^ ,Uk-i) based on the set of input /output signals and on the mathe- 
matical model of the system. In order to construct the matrix of functions 
X(£^,Ufc), genetic programming is used. An individual, in this case, is not 
a single tree but a set of trees, each tree representing different elements 
of the gain matrix. The operator and term sets are chosen in the forms 
F ■= {-h, — , } and T = {e^,Ufc,ci,C 2 , . . . ,Cm}- As a fitness criterion, whose 

minimum is searched for, the sum of normalized output errors is chosen. The 
parent population is selected using the tournament method, i.e., each parent 
is a winner, in the sense of the best fitness, from a randomly chosen sub- 
population of q candidates. The crossover operator has the classical form 
(Fig. 8.2), but the parent pair is chosen for each gain matrix element inde- 
pendently. Similarly, mutation is activated for each temporary individual and 
each of its elements with a given probability. 

Example 8.1 Let us consider a second order discrete system described by 
the following equations: 

f CLXi k^2,k ^ , ro n\ 

Xi,k+1 = r[ — -xi^kUk Fxi^k, (8.9) 

V X2,k +0 J 

( daxi^kX2,k , , ^ ^ , /Q 

X 2 ,k+\ = "T -r- + (c - X 2 ,k)Uk + X 2 ,k, (8.10) 

V X2,k +0 J 

Vk = {xi,k + e)x 2 ,k + Vk, (8.11) 

where a, 6, c, d are system parameters, and r is the sampling period. Uk = 
0.07 sin(0.31rA:) H- 0.38 is the input signal, and Vk is a realization of a ran- 
dom independent variable representing measurement noise with the normal 
distribution iV(0.2 * 10”^). The nominal values of the model parameters are 
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equal to a — 0.55, b = 0.15, c — 0.8, d = 2.0, e = 0.01. The initial 
state for the observed system and for the observer is xq = (0.21,0.37) and 
Xq = (2.1, 1.6), respectively. 

For the sake of comparison, an extended Kalman filter with parameters 
Po = 1, R — 0.001, Q = 0 and the same initial condition Xq are used. 
Moreover, parameters exploited during the evolution of gain matrices have to 
be established: = 30 is the initial height of trees, np = 40 is the popu- 

lation size, Smin == 1 * 10“^ is the desired fitness value, rimax — 500 is the 
maximal number of iterations, nt = 200 is the number of input/ output mea- 
surements, Pcross — 0.6, Pmut = 1 * lO-"^ is the probability of the crossover 
and mutation operations, T = {uk,e1^ ,a,b,e,l * lO"'^} is the set of terms, 
and F = {+,— ,*,/} is the set of operators. 

As shown in Fig. 8 . 4 , the estimated state X 2 approaches the real state for 
the proposed observer but not for the extended Kalman filter. Further simu- 
lation results have shown that the porposed observer has a larger domain of 
attraction than the extended Kalman filter, i.e., the initial estimation error 
may be larger. As has been mentioned, even if the initial state estimate is 
known, there is still the problem of model uncertainty, e.g., parameter uncer- 
tainty. 




Fig. 8.4. Real state X2 (solid line) and its estimates (dashed line) obtained for 
the proposed observer (a) and the extended Kalman filter (b) 



Consider the non-liner system (8.9)-(8.11) and assume that the values of 
the model parameters are a = 0.53, b — 0.14, c = 0.78, d = 2.0, e — 0.0099, 
and the other parameters are the same as those analysed previously. For the 
sake of comparison, it is assumed that the initial a priori state estimate is 
close to the real state so as to ensure the stability of the extended Kalman 
filter, i.e., x^ = 1. 
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Fig. 8.5. State estimation error for the proposed observer 
(solid line) and the extended Kalman filter (dashed line) 



As shown in Fig. 8.5, the state estimation error is closer 

to zero for the proposed observer than for the extended Kalman filter. Further 
simulation results have shown that the proposed observer is less sensitive to 
model uncertainty than the extended Kalman filter (Witczak et al., 1999). 

8.4.2. Designing the robust residual generator using multi-objective 
optimization and evolutionary algorithms 

Let us consider a system described by the following equations: 

±{t) — Ax{t) + Bu{t) -h d(t) + Rif{t) + ({t), (8.12) 

y{t) = Cx{t) -h Du{t) + R 2 ^{t) + rj{t), (8.13) 

where x(f) G MP' is the state vector, u{t) G W is the input vector of control 
variables, y(t) G is the measured output, and f(t) G W is an unknown 
function of time representing the fault vector. A, B, C and D are matrices 
of system parameters. The vector d{t) is the disturbance vector, which can 
include model uncertainty. rj{t) and C{t) represent the noises of the sensors 
and the input signals, respectively. The matrices -Ri and R 2 are general 
distribution matrices of faults. They describe the fault’s influence on the 
system. Depending on the type of the fault, these matrices have the following 
forms: 



R 



0, sensor faults, 
jB, controller faults. 



R2 = 



Im, sensor faults, 

D, controller faults. 
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The full observer of the system (8.12), (8.13) can be described as follows 
(Fig. 8.6): 



x(t) = (A - KC)±{t) + (^ - KD)vL{t) + Kyit), (8.14) 

y(t) = Cx(^) + £)u(t), (8.15) 

r(0 == Q[y(0 -y(^)], (8.16) 

where r eW is the residual vector, and x and y are the state and output 
estimations, respectively. The matrix Q E is the weight factor of the 

residual signals, which is constant in most cases, although, generally, it can 
vary. 



disturbances d(t) faults f{t) 




Fig. 8.6. Robust residual generator based on the full observer 

Using the model (8.12), (8.13) and the observer descriptions (8.14)-(8.16), 
it can be shown that the residual vector (if the disturbances are neglected) 
has the form 

r{s) = Q{R 2 + C{sI-A + KC)-^} 

xf(s) + QC'(s/->l + ii:C)-^[d(s) + e(0)], (8.17) 

where e(0) is the initial state estimation error. The residual signal is affected 
by the measurement and input noises, too. In the system (8.12), (8.13), the 
influence of the measurement noises rj{t) on the residual vector has the same 
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character as the factor R 2 f{t). An analogous conclusion can be drawn for 
the input noises ({t) and the factor Rif{t). These factors can be distin- 
guished only in the frequency domain. For an incipient fault signal, the fault 
information is contained within a low frequency band, and the fault devel- 
opment is slow. However, the noise comprises mainly high frequency signals. 
Both effects can be separated using different frequency-dependent weighting 
penalties. Finally, four performance indices can be defined (Chen and Patton, 
1999): 



J,{Wi,K,Q)= sup a{Wiijtj)[QR, + QC{jLoI-A 

<v€[u^i,a;2] -,—11 

+ KC)-\Ri - KR 2 )] |, (8.18) 

J 2 (W 2 K,Q)= sup a {W 2 ijoj)QC{jujI -A + KC)-^} , (8.19) 

u;E[a;i ^uj^] 



h{WzK,Q) 



sup 

,u;2] 



+ KCr^K]], 



-A 



Ji{K) = \\{A-KC)-^\\^, 



( 8 . 20 ) 

( 8 . 21 ) 



where a{*} is the maximal eigenvalue, and {W i[juj) \ i — 1,2,3) are the 
weighting penalties which separate the effects of noise and faults in the fre- 
quency domain. 

Unfortunately, conventional optimization methods used to solve the above 
multi-objective problem (8.18)-(8.21) are ineffective. An alternative solution 
is genetic algorithm implementation (Chen and Patton, 1999). In this case, 
a string of real values which represent the elements of matrices [W i \ i — 
1,2,3), K and Q, is chosen as an individual. 



8.4.3. Evolutionary algorithms in the design of neural models 

Artificial neural networks are one of the most frequently used techniques in 
designing diagnosis systems. They are effective when there is no analytical 
model of the diagnosed system. There are many techniques for construct- 
ing static neural models for non-linear systems (Duch et al, 2000; Hertz et 
aL, 1991; Korbicz et al.^ 1994), but their application to dynamic systems 
modelling requires solving additional problems. Dynamic neural networks 
are a suitable solution. They can be constructed using feedforward, multilay- 
ered networks with additional global (between layers) or local (in the neuron 
model) feedback connections (Korbicz et a/., 1994; Patan and Korbicz, 2000). 
Let us assume that y{k) of the form 

y(fc) = f{u{k),u{k - 1), . . . ,u{k - n),y{k - 1), . . . ,y{k - n')) (8.22) 

is a response of a dynamic non-linear system /(•) to an input signal u{k), 
and k E K is the discrete time. Let ft = {u : K MP'} be the family of 
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all possible maps (infinitely many) from the discrete time domain K to the 
input signals space Our goal is to construct a neural model (with the 
architecture A and the set of free parameters v) in the form 

YA,v{k) = /a,v (u(fc), u(fc - 1), . . . , u(A; - nA),yA,v{k),yA,v{k - 1), 

(8.23) 

On the basis of the system description (8.22), the problem of designing 
the neural model is connected with the minimization of some cost function 
s\iPu{k)eQ JT{yA,\r{k),y{k) \ k E: K). Thus, the following pair is searched for: 

(^opt vOpt) = argmin ( sup JT(y^,v(A:),y(A;) | A: £ ii")). (8.24) 

^u{k)GQ ^ 

Practically, the solution (A°p^, v°p*) of the problem (8.24) cannot be achieved 
because of the infinite cardinality of the set fl. In order to obtain an estima- 
tion (A*, V*) of the solution, two finite subsets C : Ql ^ 

are selected. The set of training signals fti is used to calculate the best 
vector V* for a given model architecture A: 

V* = argmin ( sup JL{yA,v{k),y{k) | k £ K)), (8.25) 

''6V ^n{k)€QL ’ 

where V is a space of network parameters. Generally, the cost functions 
for the learning process Jl ( yA,v(^), y(^) \ k e K) and the testing process 
JT{yA,yr{k),y{k) \ k £ K) can have different definitions. The set of testing 
signals fir is used to select a network architecture from all possible archi- 
tectures A = {A}, in order to minimize the following criterion: 

A* = argmin f sup JT{yA,v-{k),y{k) \ k € K)). (8.26) 

Of course, the solution of both the problems (8.25) and (8.26) does not have 
to be unambiguous. If there are several network architectures which satisfy 
the assigned criterion, then the model with the minimal number of free pa- 
rameters is chosen as the solution. 

Searching for solutions of the tasks (8.25) and (8.26) is not trivial. The 
network architecture and the training process strongly influence the mod- 
elling quality. The main problem is connected with the relation between the 
learning and generalization abilities, and the finite cardinality of the set of 
learning signals. If the architecture is too simple, then the obtained network 
input /out put mapping may be unsatisfactory. On the other hand, if the archi- 
tecture is too complex, then the obtained network mapping strongly depends 
on the actual set of training signals. 
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The application of evolutionary algorithms to the design of neural tools 
has often been described in the literature of the last decade (c.f. Obuchowicz, 
2000; Rutkowska, 2000). Generally, evolutionary algorithms as global opti- 
mization methods can be applied to the following tasks connected with the 
construction of neural models: 

• choosing the set v of free parameters of the network with a given 
architecture (training process); 

• searching for an optimal neural model architecture; the process of train- 
ing the tested architectures is performed using other known methods, e.g., 
back-propagation algorithm; 

• an evolutionary algorithm solves both tasks: architecture allocation and 
training processes simultaneously. 



Training process 

Training processes in most neural network implementations are based on 
the gradient descent method. These algorithms belong to the class of local 
optimization methods. The advantage of evolutionary training over the back- 
propagation technique has been revealed by simulation results presented in 
many works (cf. Kwasnicka, 1999). There are many programs which use evo- 
lutionary methods in the neural network training process. For example, the 
FlexTool toolbox is based on genetic algorithms with binary-coding chromo- 
somes, which contain information about all weights of network connections. 
The advantage of the Evolver program is the floating-point representation of 
an individual. This representation is natural for optimization tasks in a real 
domain (Rutkowska, 2000). 

Evolutionary algorithms are very attractive especially in the case of train- 
ing dynamic neural networks, for which the gradient-decent-based methods 
are limited to a narrow class of networks. One of the most interesting solu- 
tions of dynamic systems modelling is the application of a neural network 
based on dynamic neural models. This is a multilayered feedforward network 
of processing units, which contain an additional module: the Inflnite Impulse 
Response (HR) Alter. This Alter is located between the adder and activation 
modules. The multilayered architecture of the network allows constructing the 
Extended Dynamic Back-Propagation (EDBP) algorithm (Patan and Kor- 
bicz, 2000) for the training process of both the synaptic connection weights 
and the parameters of HR Alters. The training process of the dynamic neural 
network seems to be related to the optimization problem with a very rich 
topology of the network square-error function. The EDBP algorithm usually 
flnds an unsatisfactory local optimum. The eA'ectiveness of the multi-start 
method is very low, too. Patan and Jesionka (1999) use the genetic algo- 
rithm to solve this problem, as described in the following example. 
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Example 8.2 Let us consider the following dynamic system: 

Our goal is to construct a neural model for the system (8.27). 

In order to apply the genetic algorithm to dynamic neural network train- 
ing, the chromosome should contain information about synaptic connection 
weights, parameters of the HR filter of each neuron, and the slope parameter 
of its activation function. The chromosome length depends on the number of 
bits attributed to each parameter. 

Let us consider a network with one input unit, one output unit, and five 
hidden units. Each processing unit contains an HR filter of the first order. 
Thus, we have 40 network parameters to adjust. If the value of each parameter 
is limited to the interval [—1,1] and represented with accuracy equal to 10“^, 
then there are 18 bits per each parameter and the chromosome length amounts 
to 720 bits. The training set has the form {{u{k),y{k)) | A: = 1, . . . ,N}. The 
fitness function is chosen as follows: 

$(a,(i)) = + e„ax(i) + 0.01 (UaxW - (8.28) 

where 

N 

k=l 

^max(A) — max ^ ^Vji (it)) , 

^min(0 = mm^(v^(t)), 

and Vj (t) is the vector of parameters of the dynamic neural network encoded 
in the chromosome SLjft), j — and ?/vj(t)(^) the network's re- 

sponse to the k-th training pattern. The last factor in (8.28) amplifies the 
diversity between the best and the worst fitness in the population, and then 
the effectiveness of the applied proportional selection increases. The following 
values of the algorithm parameters are chosen: 

• the maximal number of iterations: tmax = 10000, 

• the population size: r] = 20, 

• the probability of crossover: Or — 0.7, 

• the probability of mutation: 6m = 0.007, 

• the number of training patterns: N = 50. 
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The following testing signals are chosen: 



. f 2Trk\ . f 2nk\ 



(8.29) 



. / 2nk 

in I 



sin 



i{k) = { 



V 250 r 



for k < 250, 






(8.30) 



Figure 8. 7 presents the testing results of a dynamic neural network trained 
by the genetic algorithm. These results are not satisfactory. But if this solu- 
tion is a starting point for the EDBP algorithm, then, eventually, we will get 
a high quality neural model (Fig. 8.8). 





(a) (b) 

Fig. 8.7. Response of the system (solid line) and of the 
dynamic neural network trained by the genetic algorithm 
(dashed line), to the testing signals (8.29) (a) and (8.30) (b) 

The common characteristic of the well-known evolutionary techniques 
used in the neural network training process is the off-line mode of their pro- 
cessing, i.e. the fitness function is calculated using errors for all training pat- 
terns. This fact and the evaluation process of all individuals of the population 
in each iteration result in a high numerical complexity of the algorithm. The 
ESSS-FDM algorithm is applied to on-line dynamic neural network training 
(Obuchowicz, 1999). The following example presents the effectiveness of this 
method. 



Example 8.3 Let us consider the following non-linear system: 



f{xi,X2,Xz,X4,X5) 



y{k — l)y{k — 2)u{k)u{k — 2){u{k) — 1) + u{k — 1) 
1 + y{k — 2)2 + w(fc)2 



(8.31) 
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(a) 



(b) 



Fig. 8.8. Response of the system (solid line) and of the dynamic neu- 
ral network trained by the combined method: the genetic and EDBP 
algorithms (dashed line), to the testing signals (8.29) (a) and (8.30) (b) 



The neural network architecture contains one input unit, one output unit 
and five hidden units. Each unit possesses an HR filter of the second order. 
The parameters of the ESSS-FDM algorithm are as follows: the population 
size T] = 20, the momentum ji = 0.0545, the maximal number of iterations 
^max == 5000, and the mean standard deviation of the mutation a = 0.075 for 
t < 200 and a = 0.015 for t > 200. 5000 training patterns are generated for 
the on-line training process. As the quality measure the following criterion is 
used (Ohuchowicz, 1999): 



P 2 

E (j/‘*(p) - 2/(p)) 

Jt = (8.32) 

E \ y'^{p)) 

where y^{p) and y{p) are the desired output and the actual output of the 
neural model, respectively, and P is the size of the testing set. 

The system’s and neural network’s responses to the set of testing signals 
are shown in Fig. 8.9. The quality index Jt (8.32) is lower than 0.07 for 
each testing signal. The best value Jt — 0.0058 has been obtained for the 
testing signal (8.30). This is the best result obtained for this system and this 
testing signal as known in the literature (Obuchowicz, 1999). 

Optimization of the neural model architecture 

From among all known EAs, genetic algorithms seem to be the most natural 
tool for searching a discrete space of ANN architectures. This fact follows 
from the classical structure of a chromosome - a string of elements from a 
discrete set, e.g., a binary set. 
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Quality index Jt = 0.0059 Quality index Jt = 0.01658 




Quality index Jt = 0.01639 Quality index Jt = 0.06991 




Quality index Jr = 0.02006 Quality index Jt = 0.0222 




Fig. 8.9. Response of the system (solid line) and of the neural 
model (dashed line) to the set of testing signals 

The most popular representation of the ANN architecture is a binary 
string (Obuchowicz, 2000). At first, an initial architecture Amax must be 
chosen. This architecture must be sufficient to realize a desired input-output 
relation. The Amax defines the upper limit of the complexity of the searched 
architectures. Next, all units of Amax have to be numbered from 1 to A'. 
In this way, the searching space of ANN architectures is limited to the class 
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of all digraphs of N nodes. Any architecture A (a graph) of this class is 
represented by its connection matrix V of elements equal to 0 or 1. If 
Vij = 1, then there exists a sinaptic connection from the i-th unit to the j-th 
one, Vij = 0 otherwise. A chromosome is constructed by rewriting the matrix 
V row by row into a bit string of length N^. Using such a representation 
of an ANN architecture, a standard GA algorithm can be used. It is easy to 
see that the above representation can describe an ANN of any architecture: 
feedforward networks as well as recurrent ones. If the class of the analysed 
networks is limited to MLP, then the matrix V contains many elements 
equal to 0 and cannot be changed during the searching process. Such a 
limitation complicates genetic operations and requires a lot of memory space 
in a computer. Thus, passing over these elements in the representation is 
sensible. 

Usually, an ANN has from hundreds to thousands synaptic connections in 
practical applications, and the binary code representing such an ANN archi- 
tecture is very long. As a result, standard genetic operations are not effective. 
The convergence of the genetic process deteriorates as the complexity of the 
ANN architecture increases. Thus a simplification of the network architecture 
representation is needed. One of the possible solutions is a genetic represen- 
tation of the ANN architecture in the from of the connection matrix V. The 
crossover operator is defined as the exchange of randomly chosen rows or 
columns between two matrices. In the case of mutation, each bit is turned 
with some (very small) probability. 

The above methods of the genetic representation of the ANN architecture 
are called direct encoding. This term tells us that each bit represents one 
synaptic connection in the ANN structure. A disadvantage of these methods 
is a slow convergence of the genetic process, or a lack of convergence in the 
limit of very large architectures. Furthermore, if the initial architecture Amax 
is very complex, then the result of such a genetic searching process is not as 
optimal as could be characterized by some compression level. The measure 
of the efficiency of the method can be the so-called compression index of the 
form 

K = X 100%, (8.33) 

^max 

where is the number of synaptic connections in the resulting architec- 
ture, and rjmax is the maximal number of connections acceptable in a given 
architecture representation. 



Example 8.4 In order to illustrate this compression ability of the genetic 
approach with direct encoding, let us consider an MLP which implements a 
logical conjunction (AND), inclusive OR and exclusive OR (XOR) of two 
hits. The number of the input and output units is defined by the dimension 
of the input vector u G and the output vector y G . For simplicity, the 
network architecture is limited to one hidden layer (Fig. 8.10). The problem 
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bias 




Fig. 8.10. Architecture of a network implementing 
three logical functions of two bits 



is reduced to determining the number v of hidden neurons and the number 
Tj of synaptic connections. The quality criterion is defined as follows: 

Jt = Pr}{A) + (8.34) 

where (3 and 7 are weight parameters, deciding which factor is more essen- 
tial. 

The genetic algorithm is used to solve an architecture optimization prob- 
lem. A binary string of length I is used as a chromosome, i.e., each element 
of the population belongs to the space I = {0,1}^ The maximized fitness 
function is transformed to non-negative values: 

^{A)=a-JT{A), (8.35) 

where the cost Jt is defined by (8.34)- As the stop criterion a given maximal 
number of iterations is used. The probability of crossover and mutation is 
Or = 0.6 and 0m = 0.033 for each bit, respectively. The training process 
is performed using the back-propagation algorithm. Figure 8.11 shows the 
compression ability (8.33) of the algorithm for different initial numbers of 
hidden neurons. 

An alternative class of genetic representations of ANN architectures is 
indirect encoding (Obuchowicz, 2000). One of the possible solutions is the 
binary encoding of the parameters of an MLP architecture (the number of 
hidden layers, the number of hidden neurons in each layer, etc.) and the 
parameters of the BP algorithm used for training this MLP (the training 
factor, the momentum factor, the desired accuracy, the maximal number of 
iterations, etc.). A discrete finite set of values is defined for each parameter, 
and the cardinality of this set depends on the number of bits assigned to a 
given parameter. In this case the genetic process searches not only for optimal 
architecture but for optimal training process, too. 
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Fig. 8.11. Fitness of the best element in the population vs. time for the GA 
and different initial numbers of hidden neurons z/ = 2, 3, . . . , 10 {u = 2 for 
the top curve and u = 10 for the bottom curve; for the efficiency comparison 
a = 0 in the equation (8.35)). The height of the curve’s fault is proportional 
to the compressing factor k (8.33) 



Another proposition is graph-based encoding. Let the searching space be 
limited to architectures which contain at most 2^"^^ units. Then the connec- 
tion matrix can be represented by a tree of h levels, and each node of this 
tree possesses four successors or is a leaf. Each leaf is one of the 16 possible 
matrices 2 x 2 of binary elements. Four leaves of a given node of the level 
h—1 define a 4 x 4 matrix, etc. In this way the root of the tree represents the 
whole connection matrix. The crossover and mutation operators are defined 
in the same way as in the GP method (Fig. 8.2). The GP algorithm can also 
be used to design neural models if the mapping realized by a single output 
unit is represented by an analytical formula which is searched for by the GP. 



8.5. Symptom evaluation 

The main goal of the symptom evaluation module is to decide whether a fault 
occurs, and what location and range it has. At the same time, the probability 
of making a false decision should be as low as possible. In most cases, the de- 
cision can be made using the so-called threshold test, which controls whether 
the actual residual value exceeds an alarm threshold. Many statistical tech- 
niques (Basseville and Nikiforov, 1993) are also proposed. But there are many 
cases for which such simple solutions are not satisfactory. For example, the 
so-called soft faults (Chen and Patton, 1999), which expand very slow, with 
residual changes invisible for the operator, require more sophisticated meth- 
ods. Artificial intelligence methods (Prank and Koppen-Selinger, 1997) seem 
to be a proper tool in these cases. Evolutionary algorithms can increase the 
effectiveness of these methods. 
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8.5.1. Genetic clustering 

In the case of a multidimensional vector of residual signals, the effectiveness 
of the fault diagnosis system can be improved using preprocessing, which 
divides the symptom space into subspaces (clusters). Among many well- 
known preprocessing methods, evolutionary algorithms are marked by high 
efficiency (Kosihski et al, 1998). 

Let us consider multidimensional real data ordered in training pairs: 

r={p, = \q=l,...,P}. (8.36) 

The goal is to divide the data T into clusters /C^, whose sum C = U^i 
covers the set T. M is the number, usually unknown, of the clusters of the 
final division. 

Taking into account the nature of the symptom space, a special type of 
the evolutionary algorithm is needed. An effective evolutionary technique was 
proposed in (Kosihski et a/., 1998). In this algorithm, the following quality 
indices are used: 

• the local evaluation function of a cluster K is the mean distance of 
cluster elements 

1 P 

evah (^) = p ’ P*) 

9=1 

from its centre = T p^; 

• the local fitness function of a cluster JC is based on the local evaluation 
function 



fit/(/C)-^ 



eval/(T) 
eval/ (/C) 



eval/(/C) 

■^^evah(r) 



+ (1 - ^ - 7) 



card(/C) 
card(T) ’ 



(8.38) 



where ^ G [0, 1] and 7 G [0, 1] are the parameters of the method; 

• the global evaluation function of the covering C is defined using com- 
ponent local evaluation functions: 



M 

evalp(C) = ^ eval/(/C/i). (8.39) 

h=l 

The algorithm consists of two main stages. In the first stage, m indepen- 
dent evolutionary processes are carried out in order to obtain m coverings. 
1. Create m coverings as follows: 

(a) Create a histogram of the variability of 2/ by a particular choice of 
discrete values from the range of y's. 

(b) Fuzzify the histogram by calculating its convolution with m Gauss- 
like functions 4>{y,ai) = Cj exp(— 2/^/cr|), where Ci is the normaliza- 
tion constant for z = 1, 2 , . . . , m. 
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(c) For each fuzzy histogram, its minima determine subintervals in the 
space Y. 

(d) For each fuzzy histogram, create a covering whose clusters consist of 
all training pairs with the last (i.e., y) coordinate belonging to the 
same subinterval. 

2. During the evolutionary process, in each generation apply three evolution- 
ary (genetic type) operators that act on the clusters (or pairs of clusters): 

• the unification operator - produces a new cluster as a union of both 
parents, 

• the crossover operator - exchanges parts of two clusters, 

• the separation operator - splits a cluster into two (a parameter gives 
the position of the splitting). 

3. Evaluate each offspring cluster in the population using the local fitness 
function (8.38). 

4. Apply the selection procedure to the whole population: parent and chil- 
dren clusters, following the selection procedure described below, in the 
second stage of the algorithm. In this way a new covering is constructed 
that forms the population for the next generation. 

5. At the termination of the evolutionary process (e.g., following a fixed 
number of generations, which is a parameter of the method), apply the 
global evaluation function (8.39) to each covering. From the union of all 
coverings select the set V of the best k < m coverings (with respect 
to (8.39)). 

All clusters from the k coverings from the set V take part in the second 
stage: 

(i) set the final covering U empty, and assign values of the final evaluation 
function (8.37) to each cluster from P; 

(ii) select a cluster /C from the set V (proportional selection method), 
remove it from P, modify K by removing from it all training pairs already 
included in the clusters present in U and insert it (after a modification) into 
U. Repeat this step until P is empty. 

The application of the above evolutionary algorithm can be treated as a 
preprocessing procedure for neural networks or fuzzy logic systems. In the 
second case the obtained division into clusters allows constructing fuzzy sets 
of premises for one-conditional rules with their membership functions. 

8.5.2. Evolutionary algorithms in designing the rule base 

Decision-making is usually connected with a comparison of the actual residual 
signal and a given threshold. The application of the fixed threshold reduces 
the symptom evaluation module to the typical binary logic. A crucial disad- 
vantage of this method is a high probability of false alarms, which are caused 
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by measurement noises or model uncertainties applied. An alternative way 
of solving the decision-making problem is the application of artificial intel- 
ligence methods. The concept of a diagnosis expert system, which processes 
analytical and heuristic knowledge simultaneously, is very interesting (Prank 
and Koppen-Selinger, 1997). 

The basic part of the diagnosis expert system is the knowledge base, 
which contains full information about a given process. The creation of the 
rule base is the duty of a knowledge engineer, who implements the knowledge 
exploited from experts. This kind of knowledge is usually poorly ordered, 
incoherent, incomplete and heuristic. Fuzzy logic techniques seem to be a 
suitable tool for solving this problem. Unfortunately, there are many diag- 
nostic tasks for which experts’ knowledge is insufficient and an automatic 
choice of decision rules is necessary. Because such an algorithmic problem is 
of exponential complexity, there are no possibilities of using methods of the 
full survey. Evolutionary algorithms can be an attractive method of solving 
these problems (Koza, 1992; Skowronski, 1998). 

Decision rules are a set of complexes. Each complex is a conjuction of 
selectors and each selector is a disjunction of discrete attribute values. The 
vectors of selectors create a population. Three genetic operators are used: 

• mutation - a random change of the value of a randomly chosen at- 
tribute from a randomly chosen complex; a uniform distribution is used for 
all random selections; 

• global crossover - the exchange of randomly chosen subsets of complexes 
between two parent sets of complexes; 

• local crossover - the exchange of randomly chosen attributes between 
two randomly chosen complexes of a given parent set of complexes. 

An alternative way of creating a rule base is the application of genetic pro- 
gramming (Koza, 1992). First, the set of terms T and the set of operators F 
must be defined. The set of terms contains all possible premises and con- 
clusions, and the set of operators contains operators of the propositional 
calculus. Each rule is represented by a structural tree. The goal of the GP 
algorithm is to find the best set of rules. Unlike the genetic algorithm, where 
only simple rules are processed, GP allows creating complex rules. 

Another problem is the definition of the fitness function, which has to 
evaluate the quality of the base rule for an evolutionary algorithm. One of 
the most interesting proposals is the application of the Minimum Description 
Length (MDL) technique (Skowronski, 1998), which is based on the results 
of probability and encoding theories. By combining them, MDL provides a 
general inference mechanism that can be interpreted informally as a prefer- 
ence for hypotheses that best compress the training data. Suppose a data set 
is to be transmitted to a receiver using some economy encoding. One way of 
doing it is to transmit an encoding hypothesis first and then to transmit the 
data assuming that the receiver already knows the hypothesis. The encoding 
should be optimal in both cases. The total encoding length is the sum of the 
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encoding length of the hypothesis and of the data, given the hypothesis. The 
MDL principle states that within a set of candidate hypotheses one should 
prefer the one that minimizes the total description length. To come up with 
a complete definition of the fitness function we need an estimate of the en- 
coding length for sets of rules (complexes). Assume that any attribute will 
be used by a complex with the probability pa- Assume also that the value 
Vji for the attribute Aj in any complex of any solution (for all values of 
the attributes Aj ) will be used in the complex with the probability . To 
encode the structure with a given attribute in the m complexes (decision 
rules), we need a message of a length described by following equation: 

La = mn[ - Pa\og 2 Pa ~ (1 “ Pa) log 2 (l “ Pa)) , (8.40) 

where n is the number of attributes in the complex. Apart from information 
about the structure of attributes used in the complexes for a given hypoth- 
esis, we must transmit information about the mask (set of values) for each 
attribute. To encode this we need a description of the following length: 

m n 

Sa{i,j)kj{-p^j\og 2 Pj - (1 -ppiog 2 (l -pP), (8.41) 

i=l j=l 

where Sa{i,j) = 1 means that the attribute Aj is used by the complex Ci, 
and kj is the number of the values of the attribute Aj . The complete formula 
is therefore as follows: 

eval(/i) = Ls + Ly + b (log 2 ( card(T)) + log 2 ( card(C))) , (8.42) 

where b is the number of examples misclassified by the set of rules. 

8.5.3. Genetic adaptation of fuzzy systems 

Fuzzy systems are often used for the approximation of multidimentional 
systems or in control systems of industrial processes (Koppen-Selinger 
and Frank, 1999). An example of a fuzzy system structure described by the 
formulae 

Ey*^nr=iMpx,) 

y = , (8.43) 

E nr=ipf(^i) 

k=l 

where {fx^{xi) | z 1, . . . , n; A: = 1, . . . , N)} are the membership functions, 
is presented in Fig. 8.12. Fuzzy modelling consists of two stages. First, mem- 
bership functions and fuzzy rules must be defined. The next stage consists in 
an optimal choice of shape parameters of membership functions, e.g., their 
central points, slope parameters. Both stages can be treated as optimiza- 
tion problems in the discrete and non-linear domains, respectively, which 
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Fig. 8.12. Example of a fuzzy system structure 

can be solved using evolutionary algorithms. However, there are many works 
in which evolutionary algorithms are used for fuzzy base rule construction 
(Carse et al, 1996), but the problem of choosing the membership function is 
solved mainly by gradient-descent methods. 

One of the simplest genetic algorithm approaches to the problem of choos- 
ing the membership function is based on the previously defined wide class 
of a bank of these functions. The chromosome includes encoding information 
about the number of fuzzy sets belonging to a given variable and indices of 
membership functions chosen from the bank. The modelling error, obtained 
using some gradient-descent method, constitutes the chromosome fitness. An 
optimal structure of the fuzzy system is searched for by the evolutionary 
process. 

The applied gradient-descent methods of searching for an optimal set 
of membership functions’ parameters are very often disappointing, and are 
trapped in some local optimum. Thus the evolutionary approach to these 
problems is justified. However, the application of a hybrid method, in which 
an evolutionary algorithm is used in an initial stage of the search and a 
gradient-descent method improves a previously obtained result, is more ef- 
fective (Rutkowska, 2000). 
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8.6. Summary 

Designing a fault diagnosis system for complex dynamic systems is usually 
connected with a lack of a mathematical model, or with the fact that such 
a model is unsatisfactory. Initially, the development of diagnostic techniques 
which are based on analytical models and the theories of estimation and 
identification was observed. Recently, artificial intelligence methods have at- 
tracted researchers’ attention. It is worth noticing that the process of design- 
ing fault diagnosis systems, using both analytical and artificial intelligence 
methods, can be reduced to a set of complex optimization problems. They 
are usually non-linear, multimodal and, not so rarely, multi-objective. So the 
conventional algorithms of local optimization are insufficient to solve them. 
Evolutionary algorithms seem to be an attractive tool for searching for an 
optimal solution. Although there are few applications of evolutionary algo- 
rithms to fault diagnosis systems, a discussion of the existing solutions and 
their possibilities as well as the possibilities of a further development has 
been presented in this chapter. 
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Chapter 9 



ARTIFICIAL NEURAL NETWORKS 
IN FAULT DIAGNOSIS^ 



Krzysztof PATAN*, Jozef KORBICZ* 



9.1. Introduction 

In recent years there has been observed an increasing demand for dynamic 
systems in industrial plants to become safer and more reliable. These require- 
ments go beyond the normally accepted safety-critical systems of nuclear reac- 
tors, chemical plants or aircraft. An early detection of faults can help avoid a 
system shut-down, components failures and even catastrophes involving large 
economic losses and human fatalities. A system that gives an opportunity to 
detect, isolate and identify faults is called a fault diagnosis system (Chen and 
Patton, 1999). The basic idea is to generate signals that reflect inconsistencies 
between the nominal and faulty system operating conditions. Such signals, 
called residuals, are usually calculated using analytical methods such as ob- 
servers (Chen and Patton, 1999), parameter estimation (Isermann, 1994) or 
parity equations (Gertler, 1999). Unfortunately, the common disadvantage of 
these approaches is that a precise mathematical model of the diagnosed plant 
is required and that their application is limited. An alternative solution can 
be obtained using artiflcial intelligence. Artiflcial neural networks seem to be 
particularly very attractive when designing fault diagnosis schemes. Artiflcial 
neural networks can be effectively applied to both the modelling of the plant 
operating conditions and decision making (Korbicz et al, 2002). 

Neural networks have been successfully used in many applications includ- 
ing the fault diagnosis of non-linear dynamic systems. It is worth mentioning 

§ The work was supported by the EU FP 5 Project Research Training Network 
DAMADICS: Development and Application of Methods for Actuators Diagnosis in 
Industrial Control Systems, 2000-2003. 

* University of Zielona Gora, Institute of Control and Computation Engineering, ul. Podgor- 
na 50, 65-246 Zielona Cora, Poland, e-mail: {K.Patan,J.Korbicz}Qissi. uz.zgora.pl 



J. Kacprzyk et al.(eds.), Fault Diagnosis 
© Springer- Verlag Berlin Heidelberg 2004 




334 



K. Pat an and J. Korbicz 



the applications that follow. A multi-layer perception was used to detect leak- 
ages in an electro-hydraulic cylinder drive in a fluid power system (Watton 
and Pham, 1997). The authors showed that maintenance information can be 
obtained from the monitored data using neural networks instead of a human 
operator. Crowther et al (1998) presented the application of neural networks 
to the fault diagnosis of hydraulic actuators. They showed that experimental 
faults can be diagnosed by neural networks trained on simulation data only. In 
turn, Weerasinghe et al (1998) investigated the application of a single neural 
network to the diagnosis of non-catastrophic faults in an industrial nuclear 
processing plant working at different operating points. The problem of fault 
diagnosis in rigid-link robotic manipulators with modelling uncertainties was 
investigated by Vemuri and Polycarpou (1997). A neural architecture with 
sigmoidal processing elements was used to monitor the robotic system for 
any off-nominal behaviour due to faults. Yu et al (1999) investigated a semi- 
independent neural model of the RBF type to generate enhanced residuals 
for diagnosing sensor faults in a chemical reactor. Dynamic neural networks 
were applied to an on-line fault detection of power systems (Korbicz et al, 
1998), chemical plants (Fuente and Saludes, 2000) or food industry (Lissane 
Elhaq et al, 1999; Patan and Korbicz, 2000b). 

This chapter is divided into two parts. The first part is devoted to different 
neural network structures and the possibilities of their application to the fault 
diagnosis of technical processes. There are many neural architectures which 
can be effectively applied to residual generation as well as residual evaluation. 
Residual generation can be performed using neural networks with dynamic 
characteristics such as multi-layer perceptions with tapped delay lines, re- 
current networks or GMDH (Group Method and Data Handling) networks. 
In turn, the residual evaluation task can be performed using multi-layer per- 
ceptions, self-organizing maps, radial basis networks or a multiple network 
structure. The second part of this chapter presents examples of practical 
applications of artificial neural networks to fault diagnosis. First, fault de- 
tection and isolation systems of a laboratory two-tank system with delay are 
presented. These experiments have been carried out using simulated data. 
The second group of experiments concerns instrumentation fault detection. 
To these experiments data recorded at the Lublin Sugar Factory in Poland 
have been applied. 



9.2. Structure of a neural fault diagnosis system 

The objective of a fault diagnosis system is to determine the location and oc- 
currence time of possible faults based on accessible data and knowledge about 
the behaviour of the diagnosed process, for example, by using mathematical 
or quantitative models. Figure 9.1 shows a general structure of model-based 
fault diagnosis (Chen and Patton, 1999; Patton et al, 2000). One can see 
here that the fault diagnosis procedure for a dynamic system consists of two 
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Fig. 9.1. General scheme of a model-based fault diagnosis 



separate stages: residual generation and residual evaluation. In other words, 
automatic fault diagnosis can be viewed as a sequential process involving 
symptom extraction and the actual diagnostic task. As usual, the residual 
generation process is based on a comparison between the measured and pre- 
dicted system outputs. As a result, the difference, or the so-called residual, 
is expected to be near zero under the normal operating conditions, but when 
a fault occurs, a deviation from zero should appear. In turn, the residual 
evaluation module is dedicated to the analysis of the residual signal in order 
to determine whether or not a fault has occurred and to isolate the fault in 
a particular system device. The accurate model of the diagnosed system is 
very important. Technological plants are often complex dynamic systems de- 
scribed by non-linear high-order differential equations. For their quantitative 
modelling for residual generation, simplifications are inevitable. This usually 
concerns both the reduction of the order of the dynamics and linearisation. 
Another problem arises from unknown or time variant process parameters. 
Due to all these difficulties, the conventional analytical models often turn 
out to be not accurate enough for effective residual generation. In this case 
knowledge-based models are the only alternative. 

Artificial neural networks provide an excellent mathematical tool for deal- 
ing with non-linear problems (Fine, 1999; Nelles, 2001). They have an im- 
portant property according to which any continuous non-linear relation can 
be approximated with an arbitrary accuracy using a neural network with a 
suitable architecture and weight parameters. Their another attractive prop- 
erty is the self-learning ability. A neural network can extract system features 
from historical training data using the training algorithm, requiring little 
or no a priori knowledge about the process. This provides the modelling of 
non-linear systems with great flexibility (Haykin, 1999; Norgard et a/., 2000). 
These features allow one to design adaptive control systems for complex. 
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unknown and non-linear dynamic processes. Another interesting property of 
neural networks is parallel processing. This feature is extremely useful when 
solving different pattern recognition problems. 

For residual generation, the neural network replaces the analytical model 
(Chen and Patton, 1999; Frank and Marcu, 2000) that describes the process 
under the normal operating conditions. First, the network has to be trained 
for this task. Training data can be collected directly from the process, if pos- 
sible, or from a simulation model that is as realistic as possible. The latter 
possibility is of special interest for data acquisition in different faulty situa- 
tions in order to test the residual generator, as those data are not generally 
available during the real process. The training process can be carried out off- 
line or on-line (it depends on the availability of data) . The possibility to train 
a network on-line is very attractive, especially when adapting a neural model 
to mutable environments or non-stationary systems. After finishing the train- 
ing, the neural network is ready for on-line residual generation. As can be seen 
in Fig. 9.2, a bank of neural models should be designed. Each neural model 




Fig. 9.2. Neural fault detection scheme based on a bank of process models 



represents one class of system behaviour. One model represents the system 
under its normal conditions and each successive one - in a given faulty situa- 
tion, respectively. After that, the residuals can be determined by comparing 
the system output y{k) and the outputs of models yo{k), yi{k ), . . . ,2/n(^)- 
In this way, the residual r = [ro{k),ri{k ), . . . ,rn(fc)]^, which characterizes 
the classes of system behaviour, can be obtained. Finally, the residual r 
should be transformed by a classifier to determine the location and time of 
the occurence of faults. 
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9.3. Neural models in modelling 

In general, artificial neural networks can be applied to fault diagnosis in 
order to solve both the modelling and classification problems (Chen and 
Patton, 1999; Koivo, 1994). To date, many neural structures with dynamic 
characteristics have been developed. These structures are characterised by 
good effectiveness in modelling non-linear processes. Among many, one can 
distinguish a multi-layer perceptron with tapped delay lines, recurrent net- 
works or networks of the GMDH type (Duch et a/., 2000). The properties 
of the above structures and details concerning design methodology are also 
presented. 

9.3.1. Multi-layer perceptron 

Artificial neural networks are constructed with a certain number of single 
processing units called neurons. A standard neuron model is described by 
the following equation (Duch et al, 2000; Fine, 1999): 




where Up, p = 1,2,. ..,P, are the neuron inputs, uq is the threshold, Wp are 
the synaptic weight coefficients, F(-) is the non-linear activation function. 
At the beginning, the activation function in the form of the step function was 
applied. However, sigmoid or hyperbolic tangent functions are more powerful 
and most frequently used (Haykin, 1999). 

The multi-layer perceptron is a network in which neurons are grouped 
into layers (Fig. 9.3). Such a network has an input layer, one or more hidden 
layers and an output layer. The main task of the input units (black squares) 
is preliminary input data processing [ui, U 2 , . . . , Up\^ and passing them 
on to the elements of the hidden layer. Data processing can comprise, e.g.. 




Fig. 9.3. Three-layer perceptron with P inputs and N outputs 
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scalling, filtering or signal normalization. Fundamental neural data process- 
ing is carried out in the hidden and output layers. It is necessary to notice 
that links between the neurons are designed in such a way that each element 
of the previous layer is connected with each element of the next layer. These 
connections are assigned suitable weight coefficients, which are determined, 
for each separate case, depending on the task the network should solve. The 
output layer generates the network response vector y. Non-linear neural com- 
puting performed by the network shown in Fig. 9.3 can be expressed by 

y = {W3F2 [W2 Fi(Wiu)] } , (9.2) 

where Fi, F 2 and F3 denote the non-linear operators which define neural 
signal transformation through the 1st, the 2nd and the output layer, Wi, 
W2 and W3 are the matrices of weight coefficients which determine the 
intensity of connections between neurons in the neighbouring layers; u and 
y denote the input and output vectors, respectively. 

One of the fundamental advantages of neural networks is the fact that 
they have the learning and adaptation abilities. From the technical point of 
view, the training of a neural network is simply the determination of weight 
coefficient values between the neighbouring processing units. The fundamen- 
tal training algorithm for feed-forward multi-layer networks is the Back- 
Propagation (BP) algorithm. It gives the prescription how to change the 
arbitrary weight value assigned to connections between processing units in 
the neighbouring layers of the network. This algorithm is of the iterative type 
and it is based on the minimization of a sum-squared error using the gradient 
descent method. The modification of the weights is performed according to 
the formula (Duch et al, 2000; Haykin, 1999): 

w(A: + 1) = w{k) - rjVE[-w{k)) , (9.3) 

where w(A;) denotes the weight vector at the discrete time A:, 77 is the training 
rate, and VF(w(A:)) is the gradient of the performance index E with respect 
to the weight vector w. The neural networks described above are of the static 
type, and by using them it is possible to approximate any continuous non- 
linear although static function. 

However, the neural network modelling of control systems requires taking 
into account the dynamics of the analysed processes or systems. It can be 
achieved by introducing tapped delay lines into the system model (Kuschews- 
ki et al, 1993; Norgard et a/., 2000) (Fig. 9.4). Let us assume that a process 
is described by the following non-linear discrete difference equation: 



yp{k + l) = f {yp{k),yp{k - 1), . . . ,yp{k - n), 



u{k),u{k — 1), . . . ,u{k — m)), (9.4) 

where n and m are the input and output signal delays, respectively, and 
/(•) is the non-linear function. The corresponding system model, known as 
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Fig. 9.4. Modelling using a feed-forward network; TDL - Tapped Delay Lines 



the series-parallel identification model (Fig. 9.4), can be described by the 
following difference equation (Narendra and Parthasarathy, 1990): 



y{k + 1) = f {yp{k),yp{k - 1), . . . ,yp{k - n), 



u{k),u{k — 1), ... , u{k — m)) , (9.5) 

where / represents the non-linear input-output relation of the neural network 
(an approximation of the relation /(•)). It is crucial to notice that the input 
of the neural network in this case contains the past values of the process 
output. Let us assume that a feed-forward network has been trained up to a 
given accuracy and faithfully represents a process {y ^ yp). Then, the neural 
model can be described by the following formula (Hunt et a/., 1992): 

y{k + 1) = f{y{k),y{k - 1),. . . ,y{k - n), 



u{k)^u{k — 1),. . . ,u{k — m)). (9.6) 

In this way, the properly adjusted neural network can be used independently 
for modelling. In fact, the neural network described by the formula (9.6) 
uses its own output values as a part of the input space. Thus, feedback 
from the network output to its input is introduced, and the feed-forward 
network becomes a recurrent network with outer feedback (Fig. 9.5) (Haykin, 
1999; Norgard et al, 2000). 

9.3.2. Recurrent networks 

Feed-forward networks can only represent static mappings, and therefore they 
need past inputs and outputs of the modelled process. This can be performed 
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Fig. 9.5. Relation between the feed-forward and recurrent networks 



by introducing suitable delays. Unfortunately, such networks have some dis- 
advantages. First of all, it is required to know the exact order of the process. 
If the order of the process is known, all necessary inputs and outputs should 
be fed to the network. In this way the input space of the network becomes 
large. In many practical cases, there is no possibility to learn the order of the 
system, and the number of suitable delays has to be selected experimental- 
ly. Another disadvantage is the limited past history horizon, which prevents 
the modelling of arbitrary long time dependencies between the inputs and 
the desired outputs. Moreover, the trained network has strictly static, not 
dynamic, characteristics. More natural, dynamic behaviour is assured by re- 
current networks. Recurrent neural networks (Narendra and Parthasarathy, 
1990) are characterized by considerably better properties from the point of 
view of their application to control theory. As a result of feedback introduced 
into the network structures, it is possible to accumulate information and use 
it later. From the point of view of the possible feedback location, recurrent 
networks can be divided as follows (Tsoi and Back, 1994): 

• local recurrent networks - feedback is only inside neuron models. These 
networks have a structure similar to that of static feed-forward ones, but 
consist of dynamic neuron models. 

• global recurrent networks - feedback is allowed between neurons of 
different layers or between neurons of the same layer. 

The most general architecture of a recurrent neural network was proposed 
by Williams and Zipser (1989). This structure is often called the Real Time 
Recurrent Network (RTRN), because it was designed for real time signals 
processing. The network consists of m neurons, and each of them creates feed- 
back (Fig. 9.6(a)). Any connections between the neurons are allowed. Thus, a 
fully connected neural architecture is obtained. The fundamental advantage 
of such networks is the possibility of approximating a wide class of dynam- 
ic relations. Training the network is usually complex and slowly convergent 
(Haykin, 1999). Moreover, there are problems with keeping network stability. 
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Partial recurrent networks, e.g., the Elman network, have a less general 
character (Elman, 1990) (Fig. 9.6(b)). The implementation of such networks 
is considerably less expensive than in the case of a multi-layer perceptron. 
This network consists of four layers of units: the input layer with r units, 
the context layer with s units, the hidden layer with s units and the output 
layer with n units. The input and output units interact with the outside en- 
vironment, whereas the hidden and context units do not. The context units 
are used only to memorize the previous activations of the hidden neurons. 
A very important assumption is that in the Elman structure the number of 
context units is equal to the number of hidden units. All feed-forward con- 
nections are adjustable; the recurrent connections denoted by a thick arrow 
in Fig. 9.6(b) are fixed. Theoretically, this kind of networks is able to model 
the 5-th order dynamic system, if it can be trained to do so (Haykin, 1999). 
At the specific time k, the previous activation of the hidden units (at the 
time A; — 1), and the current inputs (at the time k) are used as inputs to 
the network. In this case, the Elman network’s behaviour is analogous to the 
behaviour of a feed-forward network. Therefore the standard BP algorithm 
can be applied to train network parameters. However, it should be kept in 
mind that such simplifications limit the application of the Elman structure 
to the modelling of dynamic processes. 

A recurrent network elaborated by Parlos (1994) has yet another archi- 
tecture (Fig. 9.6(c)). An RMLP (Recurrent Multi-Layer Perceptron) network 
can be designed based on a multi-layer perceptron network, and by adding 
delayed links between the neighbouring units of the same hidden layer (cross- 
talk links), including the unit feedback itself (recurrent links) (Parlos et a/., 
1994). Empirical evidence indicates that by using delayed recurrent and cross- 
talk weights the RMLP network is able to emulate a large class of non-linear 
dynamic systems. The feed-forward part of the network still maintains the 
well-known curve-fitting properties of the MLP network, while the feedback 
part provides the dynamic character of the RMLP network. Moreover, the 
usage of past process observations is not necessary, because their effect is 
captured by internal network states. The RMLP network has been success- 
fully used as a model for dynamic system identification (Parlos et a/., 1994). 
However, a drawback of this dynamic structure is the increased network com- 
plexity and the resulting long training time. For a network belonging to the 
class A'l ^ 1 , the number of network parameters is equal to Tn? + Stt,. 

Locally recurrent networks. All recurrent networks described in the pre- 
vious sections are called globally recurrent neural networks. In such networks 
all possible connections between neurons are allowed, but all of the neurons in 
a network structure are static ones. Generally, globally recurrent neural net- 
works, in spite of their usefulness in control theory, have some disadvantages. 
These architectures suffer from a lack of stability; for a given set of initial 
values the activations of linear output neurons may grow unlimited. It causes 
a slow convergence of training algorithms. Learning could take many time 
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instants for training algorithms to achieve a proper stable output (i.e. local 
minimum in the parameter space). An alternative solution, which provides 
the dynamic behaviour of the neural model, is a network designed using dy- 
namic neuron models. Such networks have an architecture that is somewhere 
inbetween a feed-forward and a globally recurrent architecture. Tsoi (1994) 
called this class of neural networks Locally Recurrent Globally Feed-forward 
(LRGF) networks. Based on the well-known McCulloch-Pitts neuron mod- 
el, different dynamic neuron models can be designed. In general, differences 
between these depend on the location of internal feedback. 

Model with local activation feedback. This neuron model was studied by 
Frasconi et al. (1992), and may be described by the following equations: 

p R 

^ dr(p{k — r), (9.7a) 

p=l r=l 

y(k)=F{^{k)), (9.7b) 

where Up, p = 1,2,...,P are the inputs of the neuron, Wp are the in- 
put weights, (f{k) is the activation potential, dr, r = 1,2,...,P, are the 
coefficients which determine the feedback intensity of (p{k — r), and F is a 
non-linear activation function. The input to the neuron can be a combination 
of input variables and delayed versions of the activation ip{k). Note that the 
right hand summation in (9.7a) can be interpreted as the Finite Impulse Re- 
sponse (FIR) filter. This neuron model has the feedback signal implemented 
before the non-linear activation block. 

Model with local synapse feedback. Tsoi and Back (1994) introduced a 
neuron architecture with local synapse feedback. In this structure, instead of 
a synapse in the form of a weight, the synapse with a linear transfer function 
(Infinite Impulse Response, HR filter) with poles and zeros is applied. In this 
case, the neuron is described by the following set of equations: 

y{k) = F(Y^Gp{z~^)up{k)), (9.8a) 

^p=l ' 



Gp(z-^) = ^ , (9.8b) 

E 

j=o 

where Up(k), p = 1, 2 , . . . , P is the set of inputs to the neuron, Gp{z~^) is 
the transfer function, bj, j = 0, 1 , . . . , m, and aj,j = 0, 1 , . . . , n, are its zeros 
and poles, respectively. As seen from (9.8b), the transfer function has m zeros 
and n poles. Note that the inputs Up{k), p = 1, 2, . . . , P, may be taken from 
the outputs of the previous layer, or from the output of the neuron. If they 
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are derived from the previous layer, then it is local synapse feedback. On 
the other hand, if they are derived from the output y{k)^ it is local output 
feedback. Moreover, the local activation feedback is a special case of the local 
synapse feedback architecture. In this case, all synaptic transfer functions 
have the same denominator and only one zero. 

Model with local output feedback. Another dynamic neuron architecture 
was proposed by Gori et al. (1989). In contrast to the local synapse and 
the local activation feedback, this neuron model implements the feedback 
after the non-linear activation block. In a general case, such a model can be 
described as follows: 



y{k) = F[j2 



Q 

.(fc) + E' 



qV{k - q) 



p=l 



q=l 



(9.9) 



where Cq, q = 1,2, . . . are the coefficients which determine the feedback 
intensity of the neuron output y{k — q). In this architecture, the output of 
the neuron is filtered by the FIR filter, whose output is added to the inputs, 
providing the activation. It is easy to see that by the application of the HR 
filter to filtering the neuron output a more general structure can be obtained 
(Tsoi and Back, 1994). The work of Gori et al (1989) found its foundation 
in the work by Mozer (1989). In fact, one can consider this architecture as a 
generalization of the Jordan-Elman architecture (Elman, 1990). 

Memory neuron. Memory neuron networks were introduced by Poddar 
and Unnikrishnan (1991). These networks consist of neurons which have a 
memory, i.e., they contain information regarding past activations of its parent 
network neurons. The mathematical description of such a neuron is presented 
below: 

f P P \ 

y{k) = WpUp{k) -h ^ SpZp{k) j , (9.10a) 

^p=i p=i ' 

Zp{k) — apUp{k - 1) -f (1 - ap)zp{k - 1), (9.10b) 



where Zp, p = 1,2,. . .,P, are the outputs of the memory neuron from the 
previous layer, Sp, p = 1,2, . . . ,P, are the weight parameters of the memory 
neuron output Zp{k), and a = const is a coefficient. It is observed that 
the memory neuron “remembers” the past output values of that particular 
neuron. In this case, the memory is in the form of an expotential filter. This 
neuron structure can be considered as a special case of the generalized local 
output feedback architecture. It has a feedback transfer function with one 
pole only. Memory neuron networks have been intensively studied in recent 
years, and there are some interesting results regarding the application of this 
architecture to the identification and control of dynamic systems. 

Models with the MR filter. This neuron model may be equivalent to the 
general local activation structure. Dynamics are introduced into the neuron 
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in such a way that neuron activation depends on its internal states. It is done 
by introducing the HR filter into the neuron structure (Fig. 9.7) (Ayoubi, 
1994; Patan and Korbicz, 2000a). In such models one can distinguish three 
main parts: the weight sumator, the filter block and the activation block. The 
filter is placed between the weighted sumator and the activation function. The 
behaviour of the dynamic neuron model under consideration is described by 
the following set of equations: 

p 

xik) = Y. WpUp{k), 

p=i 

« « (9.11) 

~ X/ “ *) + 2-/ 

i=l i=0 

y{k) = F{gy[k)+c), 

where Wp^ p = are the input weights, Up{k)^ p = 1,...,P, are 

the neuron inputs, P is the number of inputs, y{k) is the filter output, 
ai, i = l,...,n, and bi, i = 0,...,n, are the feedback and feed-forward 
filter parameters, respectively, P(-) is a non-linear activation function that 
produces the neuron output y{k), and g and c are the slope parameter and 
the bias of the activation function, respectively. 

Due to the dynamic characteristics of neurons, a neural network of the 
feed-forward structure can be designed (like in Fig. 9.3). Taking into account 
the fact that this network has no recurrent link between the neurons, to adapt 
the network parameters a training algorithm based on the back-propagation 
idea can be elaborated. The calculated output is propagated back to the in- 
puts through hidden layers containing dynamic filters. As a result. Extended 
Dynamic Back Propagation (EDBP) has been defined (Korbicz et al.^ 1999). 
This algorithm can have both on-line and off-line forms, and therefore it can 
be widely used in control theory. The choice of a proper mode depends on 
the problem specification (Patan and Korbicz, 2000b). 

Extended Dynamic Back Propagation. Let us consider the M-layered 
network with dynamic neurons with differentiable activation functions F{‘). 
Let Sm denote the number of neurons in the m-th layer, and let uf'{k) 
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denote the output of the i-th neuron of the m-th layer at the discrete time 
k. The activity of the j-th neuron in the m-th layer is defined by the formula 

n Sm-i n 

E «(fc-*)-E“Pr(fc-o)+cr 

2=0 p=l i=l 

All unknown network parameters can be represented by a vector 9 
composed of the elements of the matrices w, a, b, g and c, where 
W = IS the weights matrix, a = 

d,nd b — ^=0v>^ 

the filter parameters matrices, g = [g'j^]m=i,...,M'j=i,...,sm slope pa- 
rameters matrix, c = [cj^] is the bias parameters matrix. 

The main objective of training is to adjust the elements of the vector 6 
in such a way so as to minimize some loss (cost) function: 

r-minJ(0), (9.13) 

where 0^ is the optimal network parameter vector, J: tf represents 

some loss function to be minimized, I is the dimension of the vector 0, and 
C CR^ is the constraint set defining the allowable values for the parameters 
0. The way of deriving the EDBP algorithm is the same as in the case of 
standard back propagation (Duch et al, 2000; Fine, 1999). The cost function 
is given by 

J(k; 9) = ||y(A;) - y(fc; 0)||, (9.14) 

where y{k) is the desired output of the network, y{k\ 9) is the actual re- 
sponse of the network on the given input pattern u{k). The cost function 
should be minimized based on a given set of input-output patterns. The ad- 
justment of the parameters of the j-th neuron in the m-th layer according 
to the off-line EDBP algorithm has the form (Patan and Korbicz, 2000a): 

N 

ef{k + 1) = ef{k) + rj Y, <5™ (9.15) 

i—1 

where N is the dimension of the training set, r] represents the training rate, 
and the error is defined as follows: 



uf{k) = F 







sTd) = < Y (<^r'(05r' 






w 



m+1 



1=1 



) 



xF'{gf-yf{i)), 



for m = M, 



(9.16) 



for m == 1, . . . , M — 1. 
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The sensitivity function 5^ for the elements of the unknown generalised 
parameter 9 for the j-th neuron in the m-th layer can be calculated accord- 
ing to the formulae (Korbicz et al, 1999): 

(i) sensitivity with respect to the feedback filter parameter a^: 




I — 1, . . . , Smi j — 1, . . . , n, (9.17) 

(ii) sensitivity with respect to the feed-forward filter parameter b'^: 

S^iik) = gr (xrik -j)-j2 <iST,i{k - i)) , 

/ = 1,...,Sto; j = 0, (9.18) 

(hi) sensitivity with respect to the slope parameter g]^: 

S^{k)^yr{k), / = 5^, (9.19) 

(iv) sensitivity with respect to the bias c^: 

S^{k) = l, l = (9.20) 

(v) sensitivity with respect to the weight 

sz, (k) - 0 - E ik - o) , 

i =0 i=l ' 

/ = 5777,5 P — I5 • • • 5 ^m—l‘ (^*^1) 

9.3.3. Neural networks of the GMDH type 

Successful identification depends on a proper selection of the model structure. 
In the case of artificial neural networks, the problem reduces to the selection 
of the number of layers and the number of neurons in a particular layer. 
Thus, it seems desirable to develop a tool which can be employed to an 
automatic selection of the network structure, based on the measured data 
only. One of the possible solutions is to use the group method and data 
handling neural networks (Farlow, 1984; Ivakhnenko and Muller, 1995; Kus 
and Korbicz, 2000; Pham and Xing, 1995). 
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Synthesis of the GMDH network. The concept of GMDH network syn- 
thesis (Fig. 9.9) is based on the iterative repetition of a specific sequence 
of operations leading to the evolution of the resulting structure. The pro- 
cess ends up when the optimal (or assumed) degree of network complexity 
is reached. Instead of one complex model, a hierarchical structure composed 
of partial models (neurons) is applied. The partial models are built using a 
single neuron model of the GMDH type (Fig. 9.8). The GMDH approach 




allows much freedom in defining an elementary transfer function F(-) (Pham 
and Xing, 1995). The original algorithm developed by Ivakhnenko (1971) uses 
the second order polynomial according to the formula 

m m m 

2/(^) — ^^0 d" ^ ^ CLjUi (/b) -f“ ^ ^ -}-•••, (9.22) 

i=l i=l j—1 

where ao? ca, O'ij are the polynomial parameters. The equation (9.22) can 
be rewritten in a niore general form as follows: 

y{k) = F{u) = F{ui{k),U 2 {k),...,Um{k)). (9.23) 

From the practical point of view, the function (9.23) should not be too 
complex, because it may considerably complicate training and prolong the 
computation time. The GMDH neural network is constructed by connecting a 




Fig. 9.9. GMDH network synthesis 




9. Artificial neural networks in fault diagnosis 



349 



given number of partial models with the application of appropriate selection 
methods (Fig. 9.9). The main characteristic of the GMDH approach is the 
separate parameter estimation of each neuron. The parameters of the neurons 
are estimated in such a way that their outputs should mimic the desired signal 
as well as possible. 

The first layer of the network is formed for any possible combinations of 
inputs. As usual, two-input neurons are used, for simplicity. The characteristic 
feature of the algorithm is an evaluation criterion allowing the definition of the 
quality of a processing error for each neuron. The selection of best performing 
neurons is carried out before the newly formed layer is incorporated into 
entire network. The neuron parameters in the created layer are frozen during 
the further network synthesis. The outputs of the selected neurons become 
the inputs of the next layer. The most frequently used criterion to evaluate 
partial model quality is the regularity criterion: 

nt 

E iVi - 

eR = , (9.24) 

Zyf 

i=l 

where rit is the number of testing samples, is the desired output, and 
yi is the output of the partial model. The regularity criterion is characterized 
by quite good properties taking into account the modelling of signal tendency 
changes. Other frequently used criteria are (Kus and Korbicz, 2000): the least 
deviation criterion, the convergence criterion, the combined criteria. 

The selection method plays the role of a mechanism of structural optimiza- 
tion at the stage of designing the new layer of neurons. Only well-performing 
partial models are preserved to build a new layer. The neurons’ outputs be- 
come inputs to the neurons in the next layer. During selection, neurons with 
too large a value of the quality index E{yn^) are rejected according to the 
chosen selection method. In order to perform the selection, one of the follow- 
ing methods can be applied: 

• The constant population method, which is based on the selection of g 
neurons for which E{yf^) reaches the least values. The number of the g 
neurons is selected empirically. 

• The decreasing population method, which defines the maximum num- 
ber of processing elements in a layer. The number of neurons in each layer 
decreases with the growth of the network. 

• The optimal population method, which is based on the rejection of neu- 
rons for which a proper quality index is larger than an arbitrarily determined 
threshold en (Fig. 9.10). Usually, the threshold en is determined separately 
for each layer. 

New neurons in the next network layers are created in a similar way. 
During the synthesis of the GMDH network, the number of layers increases 
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Fig. 9.10. Neuron selection procedure 



properly. The network synthesis is completed when the optimality criterion 
is reached. The idea behind this criterion is the determination of the quality 



index 



(0 . 






(0 _ 



min E(yl^^) 



(9.25) 



for all neurons included in the l-th layer. The values E{yn^) can be deter- 
mined using previously defined evaluation criteria. In turn, the values 
are calculated for each following layer in the network. The optimality criterion 
is achieved when the following condition is satisfied: 



-E'opt — 



min 

.,L 



min’ 



(9.26) 



where E^^^ represents the processing error for the best neuron in the net- 
work, which generates the network output. In other words, when incorporat- 
ing additional layers does not improve the performance of the network, the 
synthesis process is stopped (Fig. 9.11). 

In order to obtain the final structure of the network (Fig. 9.12), all unnec- 
essary output neurons should be removed just like the neurons in the previous 
layers, leaving only those which are relevant to the computation of the model 
output. This procedure is the last stage of GMDH network synthesis. 

Dynamic properties. A majority of industrial processes have a dynamic 
nature, hence during identification it is desirable to employ models which can 
represent the dynamics of the process. An obvious solution is the introduction 
of additional delayed signals. In this case, the input vector consists of properly 
delayed inputs and outputs of the network (Korbicz et al, 2002): 



y{k) = F{u{k),u{k - 1), . . . ,u{k - m),y{k - 1), . . . ,y{k - n)), (9.27) 

where m and n are the numbers of delays in the input and the output, 
respectively. In the network under consideration, the dynamics can be realized 
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Fig. 9.11. Illustration of the optimality criterion 




Fig. 9.12. Final structure of the GMDH network 



using the modified dynamic neuron model with the HR filter presented in 
Section 9.3.2. In the proposed neuron model one can distinguish two parts: 
the filter module and the activation function. The behaviour of the filter is 
described by the following difference equation: 

y{k) = -aiy{k - 1) - ... - anV{k - n) 

+ VQu{k) -h vju{k - 1) H h v'^u{k - n), (9.28) 

where ai , . . . , are the feedback filter parameters (n is the filter order) , 
• • • , are the input vectors to the filter module, u{k) is the input 
vector at the time k. The filter output is used as the input to the activation 
function described by the formula 

y{k) = F{m)^ (9.29) 

The objective of training in this case is to derive the feedback parameters 
ai , . . . , an as well as the vectors vf, i = 0, . . . , n. 
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9.4. Fault classification using neural networks 

Residual evaluation is a logical decision-making process that transforms quan- 
titative knowledge into qualitative Yes or No statements. It can also be seen 
as a classification problem. The task is to match each pattern of the symp- 
tom vector with one of the pre-assigned classes of faults and the fault-free 
case. This process may highly benefit from the use of intelligent decision- 
making. A variety of well-established approaches and techniques (thresholds, 
adaptive thresholds, statistical and classification methods) can be used for 
residual evaluation (Chen and Patton, 1999). Among these approaches fuzzy 
and neural classification methods are very attractive and more and more 
frequently used in FDI systems (Patton and Korbicz, 1999). 



9.4.1. Multi-layer perceptron 

In order to apply the multi-layer perceptron to residual evaluation the net- 
work has to be fed with residuals, which can be generated by another neural 
network (see Fig. 9.2). First the network should be trained for this task using 
the available data {Ti,fi}fLi, where is the z-th residual vector, fi is the 
assigned fault, N is the number of the known process operating conditions 
(including the normal conditions and faults). After finishing the training, the 
neural classifier can be used for on-line residual evaluation to decide whether 
the fault has occurred and to indicate its possible cause (Frank and K5ppen- 
Seliger, 1997; Koivo, 1994). 

9.4.2. Kohonen network 

Kohonen network is a self-organizing map. Such networks can learn to de- 
tect regularities and correlations in their inputs and accordingly adapt their 
future responses to those inputs. The network parameters are adapted by 
a learning procedure based on input patterns only (unsupervised training) 
(Haykin, 1999). Contrary to the standard supervised training methods, the 
unsupervised ones use input signals to extract knowledge from data. Dur- 
ing the training, there is no feedback to the environment or the investigated 
process. Therefore, neurons and weighted connections should have a certain 
level of self-organizing. Moreover, unsupervised training is useful and effec- 
tive only when there is a redundancy of training patterns. A two dimensional 
self-organizing map is shown in Fig. 9.13. Inputs and neurons in the com- 
petitive layer are connected entirely. Furthermore, the concurrent layer is 
the network output, which generates a response of the Kohonen network. 
Weight parameters are adapted using the winner takes all rule as follows 
(Kohonen, 1984): 



u(A;) - Wc(fc)|| = min{||u(fc) - Wj(fc)||}, 



(9.30) 
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Fig. 9.13. Two-dimensional self-organizing map 



where u(A;) is the input vector, Wc{k) is the winner’s weight vector and 
Wi{k) is the weight vector of the i-th processing unit. However, instead of 
adapting the winning neuron only, all neurons within a certain neighbourhood 
of the winner are adapted according to the formula 

{ (1 — ce)wi(A;) + aui(A:), for i G 0, 

(9.31) 

Wi{k), otherwise 

where 0 denotes the neighbourhood of the winning neuron and a is the 
training rate. The training rate and neighbourhood size are altered in two 
phases: an ordering phase and a tunning phase. The iterative character of 
the training rate leads to the gradual establishing of the feature map. During 
the first phase neuron weights are expected to order themselves in the input 
space consistent with the associated neuron positions. During the second 
phase the training rate continues to decrease, but it does so very slowly. The 
small value of the training rate tunes fine the network while keeping the 
ordering training in the previous phase stable. In the Kohonen training rule, 
the learning rate is a monotone decreasing time function. Frequently used 
functions are: a{t) = 1/t or a{t) = at~^ for 0 < a ^ 1. 

The concept of a neighbourhood is extremely important during network 
processing. Figure 9.14 shows the neighbourhoods of radius 1 and 2. The 
neighbourhood may be defined using different metrics, e.g.. Euclidean, Man- 
hattan and others. A properly defined neighbourhood influences the number 
of adapting neurons. 7 neurons belong to the neighbourhood of radius 1 
defined on the hexagonal grid (Fig. 9.14(a)). The neighbourhood of radius 
1 arranged on the rectangular grid includes 9 neurons (Fig. 9.14(b)). The 
dynamic change of the neighbourhood size beneficially influences the rate of 
feature map ordering. The training process starts with a large neighbourhood 
size. Then, as the neighbourhood size decreases to 1, the map tends to order 
itself topologically over the presented input vectors. Once the neighbourhood 
size is 1, the network should be fairly well ordered and the training rate slowly 
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Fig. 9.14. Winner’s neighbourhood: hexagonal grid (a), rectangular grid (b) 



decreases over a longer period to give the neurons time to spread out evenly 
across the input vectors. A typical neighbourhood function is of the form of 
the Mexican Hat (Duch et al, 2000). After designing the network, a very 
important problem is how to assign the clustering results generated by the 
network to the desired results of a given problem. It is necessary to determine 
which regions of the feature map will be active during the occurrence of a 
given fault. 

9.4.3. Radial basic networks 

In recent years Radial Basis Function (RBF) networks have been more and 
more popular as being an alternative solution to the slowly convergent multi- 
layer perceptron (Nelles, 2001). Similarly to the multi-layer perceptron, the 
radial basis network has the ability to model any non-linear function. How- 
ever, this kind of a neural network needs many nodes to achieve the required 
approximating properties. This phenomenon is similar to the choice of the 
number of hidden layers and neurons in the multi-layer perceptron. 

The RBF network architecture is shown in Fig. 9.15. Such a network 
has three layers: the input layer, only one hidden, non-linear layer, and the 
linear output layer. It is necessary to notice that the weights connecting the 
input and hidden layers have values equal to one. This means that the input 
data are passed on to the hidden layer without any weight operation. The 
output ifi of the n-th neuron of the hidden layer is a non-linear function 
of the Euclidean distance between the input vector u = [ui, . . . ,up]^ and 
the vector of the centres ci = [cn , . . . and can be described by the 

following expression: 



n 



pi) 5 i — 1, . . . , 



(9.32) 
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where pi denotes the spread of the i-th basis function and || • || is the 
Euclidean norm. The j-ih network output yj is a weighted sum of the hidden 
neurons ’s outputs: 

n 

Vj-'^OjUPi, (9.33) 

i=l 

where 6ji denotes the connecting weight between the z-th hidden neuron 
and the j-th output. Many different functions (/>(•) were suggested. Those 
most frequently used are Gaussian functions: 

(j){z, p) = exp ) (9-34) 

or inverted quadratic functions: 

4>{z, p) = {z^ + . (9.35) 

A fundamental operation in the RBF network is the selection of a proper 
number of hidden neurons, function centres and their position. On the one 
hand, too small a number of centres can result in weak approximating proper- 
ties. On the other hand, the number of exact centres increases exponentially 
with an increase in the input space size of the network. Hence, it is unsuitable 
to use the RBF network in problems where the input space has large sizes. 

To train such a network, hybrid techniques are typically used (Nelles, 
2001; Patan et a/., 2002). First, the centres and spreads of the basis func- 
tions are established heuristically. After that the adjusting of the weights is 
performed. The centres of the radial basis functions can be selected in many 
ways, e.g., as values of the random distribution over the input space or by 
clustering algorithms, which give statistically the best choice of the centre 
numbers and their positions. When the centre values are established, the 
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objective of the training algorithm is to determine the optimal weights pa- 
rameters 9ji, which minimize the difference between the desired and the real 
network response. The output of the network is linear in weights and that is 
why for the estimation of the weight matrix traditional regressive methods 
can be used. An example of such a technique is the recursive least square 
method. 

9.4.4. Multiple network structure 

In many cases, a single neural network of a finite size does not assure the 
required mapping or its generalisation ability is not sufficient. So, if one can 
design networks which consider only the characteristic parts of the complete 
mapping, it would fulfill its task much better. The underlying idea of the 
presented multiple network scheme is to develop n independently trained 
neural networks for n working points and to classify a given input pattern by 
using a decision block. A general scheme of the multiple network structure, 
the so-called scheme with many experts (Marciniak and Korbicz, 2001), is 
presented in Fig. 9.16. 




Fig. 9.16. Parallel expert scheme 



The decomposition of a complex classification problem can be performed 
using independently trained neural classifiers (experts) designed in such a 
way that each of them is able to recognize only few classes. The decision 
block underdecides which expert should classify a given pattern. This task 
can be carried out using a suitable rule base in the following form: 

if u E Ui then Expert#i, for 2 = 1, . . . ,n, (9.36) 

where u is the testing sample and Ui is the i-th input subspace. The degree 
of membership of the sample u in proper subspace can be verified using 
single features or a set of features. Both premises and conclusions of the rules 
can have crisp or fuzzy values. In the case of classical logic, weights assigned 
to experts have binary representation, and in the case of fuzzy logic they have 
values from the interval (0, 1). 
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Fig. 9.17. Membership function distribution: fuzzy system (a), classical logic (b) 



In classical logic, features are separable 9.17(b). It means that only one 
rule can have a value equal to zero, and final classification is undertaken by 
one expert only. During fuzzy reasoning, the final classification may depend 
on many experts. The response of the classifier may be calculated in the form 
of a weight sum of all experts’ outputs: 

y = 9iyi + 522/2 H h 9nyn, (9.37) 

like in Fig. 9.16, or using some defuzzification methods, e.g., centroid meth- 
ods, according to the formula 



n 

E 9i9i 

y = —n • (9-38) 

E 9i 

i=l 

Experts’ modules can be implemented using artificial neural networks, 
e.g. multi-layer perceptrons (Auda and Kamel, 1997) or radial basis networks 
(Marciniak and Korbicz, 2001). Each neural network may have an optional 
structure and should be trained with a convenient algorithm using different 
training sets. The only condition is that each neural network have the same 
number of outputs. 



9.5. Selected applications 

9.5.1. Two-tank laboratory system 

Let us consider a two-tank laboratory system with flow delay as the diagnosed 
process (Pat an and Korbicz, 2000a; Pat an et al, 1999). A two-tank system 
is a technical realisation of a non-linear Multi-Input Multi-Output (MIMO) 
system (Fig. 9.18). The system consists of two cylindrical tanks with delay 
in the form of a spiral pipeline. The nominal outflow Qn is located in the 




358 



K. Pat an and J. Korbicz 




Fig. 9.18. Scheme of a two-tank system 



tank T 2 . The pump driven by a DC motor supplies the tank Ti, where Qi 
is the inflow of the liquid through the pump to the tank Ti. Both tanks are 
equipped with sensors for measuring the liquid levels The valves 

VijV 2 ^Vs,V 4 and Ve are electronically controlled. The aim of the control 
system is to keep the water level in the tank T 2 constant. In such a system, 
different types of faults like clogs or leakages can be introduced, but in the 
present work, the following three faults are considered: 

(i) the valve V 2 closed and blocked, 

(ii) the valve V 2 opened and blocked, 

(iii) a leakage in the tank T \ . 

Residual generation. In the proposed FDI system, four classes of system 
behaviour - the normal operating conditions /o and three faults /i , /2 and 
/s - are modelled by a bank of neural observers designed with cascade neural 
networks composed of dynamic neuron models with HR filters (Korbicz et a/., 
1999). The MODEL 0 identifies the system under the normal operating con- 
ditions and the MODELl ^ MODEL2 and MODELS model the faults /i, /2 
and fs , respectively. Each model has one output (the liquid level in the tank 
Ti) and two inputs (the infiow Qi and the outflow Examples of the 
modelling results are presented in Figs. 9.19(a) and 9.19(b). The characteris- 
tics of neural models are presented in Table 9.1. The notation ^ means 
that the network consists of n processing layers with r inputs, v hidden 
neurons and s output neurons. It can be seen that good modelling results for 
relatively small network sizes have been obtained. The initial error displayed 
in Figs. 9.19(a) and 9.19(b) depends on the quality of network training and 
different initial conditions of the system and neural model. The better the 
modelling quality (lower network output error) and the more similar the ini- 
tial conditions, the lower the initial error. Based on neural observers, residual 
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(a) 



(b) 



Fig. 9.19. Liquid level in the tank T\ (solid) and the output of the 
MODEL 0 (dashed) under the normal operating conditions fo (a), the 
output of the MODEL 1 (dashed) for the fault /2 (b) 



Table 9.1. Neural models characteristics 



Network 


System states 


characteristics 


fo 


h 


/2 


h 


Name 


MODEL 0 


MODEL 1 


MODEL 2 


MODEL 3 


Structure 


^ 2 , 3,1 


^2,2,1 


^2,2,1 


^2,2,1 


Filter order 


first 


first 


second 


first 



signals can be generated (see Fig. 9.20). Each model is sensitive to only one 
class of system behaviour and for this class it generates a residual near zero. 
The residual assigned to the normal operating conditions generates a 
deviation from zero when a fault occurs (Fig. 9.20(a)). 

Residual evaluation. The residuals should be transformed into the 
fault vector f. This operation can be seen as a classification problem. 
For fault diagnosis, it means that each pattern of the symptom vector 
^ = [^/o 5 ^/i ?^/2 5 ^/ 3 ] is assigned to only one class of system behaviour 
{/o,/l,/2,/3}- 

Multi-layer perceptron. To perform residual evaluation, the well-known 
static multi-layer feed-forward network can be used. In fact, the neural clas- 
sifier should map a relation of the form ^ E"^ : ^(r) = f. Each 

bit of the binary vector f is used to code a single fault. An example of fault 
coding is shown in Tab. 9.2. The representations of the fault vector f given 
in Tab. 9.2 match all known kinds of system conditions. In the case of the 
normal operation the classifier should generate f = [0 0 0]. Any other vec- 
tor representation (e.g. f = [1 1 1]) shows that the system operates under 
unrecognised conditions. 
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(a) 



Nominal conditions 

. 


Fault /a 




MODEL 2 



0 1000 2000 
Discrete time 



(c) 




(b) 




(d) 



Fig. 9.20. Residuals for different cases 



Table 9.2. Coding of faults 



Faults 


Fault vector f = [/i /2 /s] 


h 


/2 


h 


normal conditions 


0 


0 


0 


valve V2 closed and blocked 


1 


0 


0 


valve V2 opened and closed 


0 


1 


0 


leak in tank 1 


0 


0 


1 



The network applied belongs to the class A^|^ 5 ^ 4 ^ 3 . The training process 
was carried out off-line for 20000 steps using the standard back-propagation 
algorithm with the momentum and adaptive training rate. The momentum 
parameter was equal to 0.95 and the initial training parameter was equal 
to 0.01. Fifty training patterns per one class of system behaviour were cho- 
sen for the training process. As a result, a set of 200 training patterns was 
obtained. The activation function in the hidden layer was of the hyperbolic 
tangent type, and in the output-layer of the linear type. Taking into account 
the fact that one of the design assumptions was a binary response of the clas- 
sifier (Tab. 9.2), its outputs should be close to the nearest integer, while the 
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Fig. 9.21. Rounding of the classifier’s output 



Table 9.3. Relationship between confidence level and 
non-classified, and misclassified system behaviours 



Confidence 

level 


Non-classified 

behaviour 


Misclassified 

behaviour 


0.1 


7,8% 


15,4% 


0.05 


10,5% 


14,1% 


0.02 


13,9% 


12,1% 


0.01 


16,8% 


10,3% 


0.005 


21,9% 


8,1% 


0.001 


97,7% 


2,2% 



classifier output differs from the desired one less than the assumed accuracy 
called the confidence level (Fig. 9.21). Each classifier response, which is in- 
side the confidence interval, is rounded to the nearest integer (see Fig. 9.21). 
Table 9.3 illustrates the relation between the value of the confidence level and 
the percentage of non-classified and badly classified system behaviour (Ko- 
rbicz et a/., 1999). These results are obtained for the detection of the fault 
/i. The applied classifier has a simple structure and can be trained with an 
uncomplicated training algorithm. However, it has one serious disadvantage 
- the necessity to round the classifier output with a fixed confidence level. It 
leads to the undesirable effect of non-classified system behaviour. In fact, with 
an increase in the confidence level, the number of instances of non-classified 
behaviour decreases, but the number of instances of misclassified behaviour 
increases. 




362 



K. Pat an and J. Korbicz 



Kohonen Network. To solve the given classification problem a two- 
dimensional Kohonen network with the following structure was used: 4 inputs 
(number of residuals) and 49 processing elements (7 neurons by 7 neurons). 
The training set consists of 200 patterns representing 4 process operation 
conditions - 50 patters for each condition. The self-organizing network de- 
fined on a rectangular grid was trained for 20000 steps. The classification 
results are shown in Fig. 9.22. For the system working under the nominal 
operation conditions (the system is healthy), the lower part of the feature 
map (Fig. 9.22(a)-(c)) is active. In the case of the occurrence of the fault /i, 
the lower-right part is activated (Fig. 9.22(d)-(f)). Analogous situations can 
be observed for the next two faults. For the fault /2 the upper-left part of 
the feature map is activated (Fig. 9.22(g)-(i)), and in the case of the fault 
/s it is the upper-right part (Fig. 9.22(j)-(l)). 

The main advantage of the Kohonen network is that weighted parameters 
are derived using input data only. The only difficulty here is the interpretation 
of results generated by the network. A system operator should have quite a 
considerable experience to interpret the results correctly. Moreover, there 
can occur the so-called overlapping problem. The same part of the feature 
map can be active for two different operating conditions. One can solve this 
problem easily by increasing the size of the feature map. In this case, however, 
a longer computation time will be observed. 




(j) (k) (1) 



Fig. 9.22. Results generated by the Kohonen network 
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Multiple network structure. The task of the fault detection and isolation 
of a two-tank system with delay can be also solved by employing a multiple 
network structure. During this experiment one can apply the structure of 
parallel experts based on radial basis networks. The idea of this approach 
is presented in Fig. 9.23 (Marciniak and Korbicz, 2001). For each operating 
point, one neural classifier is developed. Its partial response is taken into ac- 
count when calculating the whole classifier output. After that, an interpreter 
block is designed, which is responsible for determining which expert should 
be taken into account during decision making. The fuzzyfication block shown 
in Fig. 9.23 plays the role of the gate presented in Fig. 9.16. The classifier 
output can be calculated according to the rule (Calado et aL, 2001): 

if u eUi then NNi{u), i = 1, . . . , n, (9.39) 

where u is the input, Ui{u) is the membership function of the z-th system 
state, NNi is the output vector of the i-th neural network, and n is the 
number of the working points. 

The system’s response can be derived using (9.38). In order to estimate the 
liquid levels in both tanks, as a feature extractor the ARX (Auto-Regressive 
with exogenous input) estimator was applied. During the experiments, two 
working points were assumed: levels in the tank Ti equal to 0.5 and 0.6 m. 




Fig. 9.23. Multiple network structure for fault diagnosis 
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respectively. Each state of the system was represented by 50 learning patterns. 
The vector of states F consists of the following elements: F=[nominal con- 
ditions, leakage, valve Vi closed and blocked, valve Vi opened and blocked\. 
Two neural networks were designed and trained for the examined working 
points. The networks consist of 90 and 81 hidden neurons, respectively. Fig- 
ure 9.24 shows the results obtained by a multiple network classifier in the 
case of an unknown working point (not represented in the training data). 
As one can see in Fig. 9.24, the component of the vector F assigned to the 




(a) 





(b) 




(c) 



(d) 




Fig. 9.24. Components of the vector F\ nominal operating conditions (a), 
valve V\ closed and blocked (b), valve V\ opened and blocked (c), leakage in 
tank Ti (d), multiple faults: leakage and valve V\ closed and blocked (e) 
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current state of the system reaches a value equal to 1, other components have 
values near zero. When the system works in the nominal operating conditions 
(Fig. 9.24(a)), the component F[l] assigned to the normal conditions has a 
value near one, other components are near zero. Similar situations for a suit- 
able faulty scenario can be observed in Figs. 9.24(b)-(d). A very interesting 
case is presented in Fig. 9.24(e), concerning multiple faults. As one can see 
there, the neural fault diagnosis system detects both faults firmly. It is worth 
noting that, in general, the detection of multiple faults is a very difficult task. 
Experiments have proved the practical usefulness of the proposed multiple 
network structure. 

9.5.2. Instrumentation fault detection 

This example shows the design procedure of the instrumentation fault detec- 
tion system in the evaporation station at the Lublin Sugar Factory, Poland. 
In a sugar factory, sucrose juice is extracted by diffusion. The juice is con- 
centrated in a multiple-stage evaporator to produce syrup. The liquid goes 
through a series of five stages of vapourisers, and in each passage its su- 
crose concentration increases. The first three sections are of the Roberts type 
with a bottom heater-chamber, while the last two are of the Wiegends type 
with a top heater-chamber. The sugar evaporation control should be per- 
formed in such a way that the energy used is minimized to achieve the re- 
quired quality of the final product. The main inconvenient features that com- 
plicate the control of the evaporation process are (Lissane Elhaq et al, 1999): 

• a highly complex evaporation structure (a great number of interacting 
components) , 

• large time delays and responses (the configuration of the evaporator, 
their numbers, capacities, etc.), 

• strong disturbances caused by violent changes in the steam, 

• many constrains on several variables. 

Figure 9.25 shows the first evaporation stage with four preheaters. One 
can see there all measurably available variables marked. Due to technological 
improvements and a very careful inspection of the installation before starting 
a three-month sugar campaign, faults in sensors, actuators and technology 
are rather exceptional. Therefore, to design a fault detection system, only the 
data for the normal operating conditions can be used. 

Based on observations of process variables and knowledge about the pro- 
cess, the juice temperature after the evaporation section submodel can be 
designed (Patan and Korbicz, 2000a): 

Ts 2 = f{Tp,Tsi,Fp,Fs), (9.40) 

where /(•) is a non-linear function, and the specification of the suitable 
process variables (9.40) is presented in Tab. 9.4. 
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Temperature 
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D51_02Bx 



Fig. 9.25. First stage of sugar evaporation 



Table 9.4. Specification of process variables 



Variable 


Symbol 


Specification 


Fs 


F51_01 


Thin juice flow to the input of the evaporation station 


s : Pp 


F51_02 


Steam flow to the input of the evaporation station 


Tp 


T51_06 


Input steam temperature 


Ts2 


T51_08 


Juice temperature after the first section 


Tsi 


T51_05 


Thin juice temperature after the heaters 



Fault detection using dynamic neural networks. The neural model of the 
juice temperature is designed using data recorded at the factory in October 
1999 during a sugar campaign. Suitable process variables are measured in 
chosen points of the evaporation station by relevant sensors. After that, the 
obtained data are transferred to the monitoring system and stored there. The 
sampling time is 10s. During one work shift (8 hours), 2880 samples per one 
monitored process variable are gathered. For many industrial processes, the 
measurement noise is of a high frequency. Therefore, to eliminate this noise, 
a low pass filter of the Butterworth type of the second order was used. More- 
over, the raw data should be preprocessed. The inputs of the models under 
consideration are normalized to the zero mean and unit standard deviation. 
In turn, the output data should be transformed taking into consideration the 
response range of the output neurons. For the hyperbolic tangent neurons, 
this range is [—1, 1]. To perform such kind of transformation, linear scalling 
can be used. Moreover, to avoid the saturation of the activation function, the 
raw output data will be transformed to the interval [—0.8, 0.8]. 

First of all, neural models to identify suitable process variables under the 
normal conditions are designed. After that, residuals may be determined by 
comparing the measured values and model outputs. The occurrence of faults 
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is signaled by a deviation of the residual value from zero. The training process 
is carried out on-line using the extended dynamic backpropagation algorithm 
of the network of a dynamic neurons architecture and data for one work 
shift (8 hours). To check the sensitivity and effectiveness of the proposed 
fault detection system, data with artificial faults in measuring circuits are 
employed. The faults are simulated by increasing or decreasing the values 
of particular signals by 5, 10 and 20% at specified time intervals. These 
artificial faults were chosen very carefully, taking into account the process 
safety and minimization of economic losses. Below, the experimental results 
for the detection of instrumental faults are described. 

The dynamic network model of this process belongs to the class -/V4 3 ^ 
(4 inputs, 3 hidden neurons and 1 output neuron). Each neuron in the network 
model has the first order HR filter and the hyperbolic tangent activation 
function. 

In Fig. 9.26(a), the modelling results for the juice temperature submodel 
(9.40) are shown. The output of the process is marked by the black line and 
the output of its neural model by the grey one. As can be seen in Fig. 9.26(a), 
the predicting capabilities of the neural model are quite good. Figure 9.26(b) 
presents the residual calculated as a difference between the process and neu- 
ral model outputs, for simulated failures of different sensors. The particular 
sensor faults are introduced successively at chosen time intervals. As can be 
seen in Fig. 9.26(b), the faults are very easy to detect, e.g., using a simple 
threshold technique. The fault detection system is most sensitive to the fail- 
ure of the output sensor - T52 (2100-2400 time steps). Even failures of a 
sensor smaller than 5% can be detected immediately. Diagnostic results for 
the faults of the sensors Tp (1500-1800 time steps) and Tsi (2450-2750 
time steps) are somewhat worse. However, in both cases 5% of failures are 
explicitly and firmly detected by the proposed fault detection system. The 





(a) (b) 



Fig. 9.26. Modelling results for the temperature model - nominal 
conditions (a), residual (b) 
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worst results are obtained for the failures of the sensors Fs and Fp (0- 
600 time steps). The faults in both sensors are revealed by small spikes in 
the residuals. Only large failures are signaled. That means that the fault de- 
tection system is not very sensitive to the occurrence of faults in these two 
sensors. 

Fault detection system based on GMDH networks. The effectiveness of 
the GMDH network is illustrated by an example of the modelling of the juice 
temperature submodel (9.40). 

First, data were filtered like in the previous example. Furthermore, all 
data sets were normalized to zero mean, and the output signals were scalled 
to fall in the range [—0.7; 0.7]. The neural network consists of dynamic neuron 
models of the GMDH type with HR filters. Each neuron has the second 
filter order and a hyperbolic tangent activation function. During network 
synthesis, the constant population method was used. The synthesis procedure 
was continued until the condition (9.26) was satisfied. After the synthesis, all 
redundant weighted connections were removed according to the procedure 
described in Section 9.3.3. 

The final structure consists of six processing layers A/J 4 4 3 3 2 ,i- In the 
parameter estimation of partial models, outer bounding elipsoid algorithm 
was used (Korbicz et a/., 2002). In order to achieve higher parameter esti- 
mation accuracy, information concerning the confidence area of the derived 
parameters was used. The information is included in a special matrix, which 
is used in the automatic selection of the training patterns. The processing 
error for the training set Cu and the testing set et is defined using the regu- 
larity criterion (9.24). The processing errors for the last network layer are as 
follows: Cu = 0.1124 and et = 0.3603. 

The output of the process and of the neural model for the training set 
(700 samples) is presented in Fig. 9.27. It is clearly seen that the output of 




Fig. 9.27. Responses of the process (solid) and 
model (dashed) in the case of the training set 
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the neural model follows the output of the process almost immediately. This 
confirms the fact that the model has pretty good prediction capabilities. The 
testing of the model was carried out using another set of 700 samples. The 
modelling results for this data set are shown in Fig. 9.28(a). 




Discrete time 




0 100 200 300 400 500 600 700 

Discrete time 



(a) 



(b) 



Fig. 9.28. Responses of the process (solid) and model (dashed) in the 
case of the testing set (a) and the residual (b) 



By comparing the output of the process with the output of the neural 
model, the residual signal can be obtained (Fig. 9.28(b)). Using the residual, 
one can decide whether a system is healthy or not. By analyzing the obtained 
signal one can see that it is generally distributed around zero, and large 
deviations occur in the case of abrupt changes in the process response. These 
situations can cause false alarms, because the residual has a value larger than 
zero but the process still works under the normal operating conditions. 

9.5.3. Actuator fault detection and isolation 

The actuator to be diagnosed is marked in Fig. 9.25 by a dashed circle. 
A block scheme of this device is presented in Fig. 9.29, where the measurable 
process variables are marked by a dashed line. The analysed actuator consists 
of three main parts: the control valve, a linear pneumatic servo-motor and 
a positioner (Bartys and Koscielny, 2002). A description of the symbols and 
process variables is presented in Tab. 9.5. The control valve is typically 
used to allow, prevent and/or limit the fiow of fluids. Changing the state 
of the control valve is accomplished by a servo-motor. A pneumatic servo- 
motor is a compressible fluid-powered device in which the fluid acts upon the 
flexible diaphragm to provide a linear motion of the servo-motor stem. The 
third part is a positioner, applied to eliminate the control valve stem miss- 
positions developed by external or internal sources such as frictions, pressure 
unbalance, hydrodynamic forces, etc. The structural analysis of the actuator 
and expert knowledge allows defining the relations between the variables. 
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Fig. 9.29. Block scheme of the diagnosed actuator 



Table 9.5. Description of symbols 



Symbol 


Variable 


Specification 


Range 


Fl, V2 


- 


Hand driven cut-off valves 


- 


Fs 


- 


Hand driven by-pass valve 


- 


V 


- 


Control valve 


- 


Pi 


P51_05 


Pressure sensor (valve inlet) 


0 - 1000 kPa 




P51_06 


Pressure sensor (valve outlet) 


0 - 1000 kPa 


Ti 


T51_01 


Liquid temperature sensor 


50 - 150 “C 


F 


P51_01 


Process media flowmeter 


0 - 500 m^/h 


X 


LC51_03X 


Piston rod displacement 


0 - 100 % 


Xp 


- 


Positioner feedback signal 


- 


Ps 


- 


Pneumatic servo-motor 
chamber pressure 


— 


CV 


LCbl_03CV 


Control signal 


0 - 100 % 



The resulting causal graph is presented in Fig. 9.30. Besides basic measured 
variables, there are variables that seem to be realistic to be measured: 

• the positioner supply pressure - Pz, 

• the pneumatic servo-motor chamber pressure - Ps, 

• the position P controller output - CVI^ 

and an additional set of unmeasurable physical values useful in structural 
analysis: 

• a flow through the control valve - Fy , 

• a flow through the by-pass valve - Fys, 

• the Vena-contacta force - Fyc^ 

• the by-pass valve opening ratio - Xs. 
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Fiig. 9.30. Causal graph of the main actuator variables 



Taking into account the causal graph presented in Fig. 9.30 and the set 
of measurable variables, the following two relations are considered: 

• servo-motor rod displacement: 



X-ri(Cy,Pi,P2,Ti,X), (9.41) 

• a flow through the actuator: 

F-r2(X,Pi,P2,Ti), (9.42) 

where ri (•) and r 2 (*) are non-linear functions. Fault isolation is possible only 
if data describing several faulty scenarios are available. Taking Into account 
safety regulations it is impossible to generate real faulty data. Therefore, 
in collaboration with the sugar factory, some faults have been simulated by 
manipulations on process variables. During the experiments, the following 
faults were cohsidered: fi - a positioner supply pressure drop, /2 - an 
unexpected pressure change across the valve, and /s - a fully opened by- 
pass valve. 

The first faulty scenario can be caused by many factors such as pressure 
supply station faults, oversized system air consumption, breaks of air leading 
pipes, etc. This is a rapidly developing fault. The physical interpretation of 
the second fault can be media pump station failures, an increased resistance 
of pipes or external media leakages. This fault is rapidly developing as well. 
The last scenario can be caused by valve corrosion or seat sealing wear. This 
fault has an abrupt nature. 

In the proposed FDI system four classes of system behaviour - the normal 
operation conditions and three faults fi — fs - are modelled by a bank of 
dynamic neural networks, according to the scheme presented in Fig. 9.2. To 
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identify (9.41) and (9.42), a dynamic neural network with four inputs and 
two outputs was used: 



= NN(Pi,P2,TuCV), (9.43) 

F 

where NN is the neural model. Fault models were trained using the Simul- 
taneous Perturbation Stochastic Approximation method (Patan and Parisini, 
2002). In order to obtain optimal neural network structures, the Akaike in- 
formation criterion and final prediction error methods were applied. 

A specification of the final neural models is presented in Tab. 9.6. The se- 
lected neural networks have relatively small structures. Only two processing 
layers with 5 or 7 hidden elements are enough to identify faults with pretty 
high accuracy. Moreover, the dynamic neurons have hyberbolic tangent ac- 
tivation functions and first order HR filters. Each neural model was trained 
using suitable faulty data. After that, the performance of the constructed 
models was examined using both nominal and faulty data. The experimental 
results are presented in the following paragraphs. 



Table 9.6. Neural models for faulty scenarios 



Fault 


Structure 


Filter 

order 


Activation 

function 




^ 4 , 7,2 


1 


hyperbolic tangent 


h 


^4,7,2 


1 


hyperbolic tangent 


h 


^4,5,2 


1 


hyperbolic tangent 



Fault fi. This fault was simulated at the 270 time step and has the 
duration of about 275 time steps. The neural network was designed to identify 
an actuator for the fault fi . During the occurrence of the fault fi residuals 
generated by the model should be near zero, otherwise the residuals should 
have large values. As can be seen in Figs. 9.31(a) and 9.32(a), the model 
mimics the behaviour of the actuator very well in both faulty and normal 
operating conditions. In turn, in Fig. 9.31(b) and 9.32(b) one can see that 
the residuals are near zero regardless of the operating conditions. This means 
that this fault is undetectable by the neural model. 

Fault ^ 2 - The next faulty scenario was simulated at 775 time step (pres- 
sure off) till the 1395 time step (pressure on). Figure 9.33(a) presents the 
modelling of the flow through the actuator F, in the case of a fault (775- 
1395 time steps), the output of the neural model (grey) follows the output of 
the actuators (black). 
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(a) 




400 600 

Discrete time 

(b) 



800 



Fig. 9.31. Fault f\. The output F of the actuator (black) 
and the output of the model (grey) (a), the residual (b) 




(a) 




(b) 



Fig. 9.32. Fault f\. The output X of the actuator (black) 
and the output of the model (grey) (a), the residual (b) 



The residual clearly shows that changes caused by the fault are firmly 
and immediately detected by the neural model (Fig. 9.33(b)). The results 
obtained for the servo-motor rod displacement X (Fig. 9.34(a)) are some- 
what worse. However, by analyzing the residual one can easily observe the 
occurrence of faults. In the case of faults the residual is arranged around zero, 
but in the case of the normal operating conditions the residual deviates from 
zero (Fig. 9.34(b)). 
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Fig. 9.33. Fault / 2 . The output F of the actuator (black) 
and the output of the model (grey) (a), the residual (b) 




0 500 1000 1500 2000 

Discrete time 

(b) 

Fig. 9.34. Fault / 2 . The output X of the actuator (black) 
and the output of the model (grey) (a), the residual (b) 

Fault fs. This fault was simulated at the 860 time step (valve opening) 
till the 1860 time step (valve closing) . In this case the neural model faithfully 
reproduces both output signals: the flow F and the rod displacement X. 
Figures 9.35(a) and 9.36(a) present the behaviour of the actuator (black) and 
the neural model (grey) under the normal operating conditions and during the 
fault (860-1860 time steps) . Both residuals confirm that by using a neural 
model the fault /s can be detected and isolated distinctly (Figs. 9.35(b) 
and 9.36(b)). 
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(b) 

Fig. 9.35. Fault /a. The output F of the actuator (black) 
and the output of the model (grey) (a), the residual (b) 
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(b) 

Fig. 9.36. Fault /s. The output X of the actuator (black) 
and the output of the model (grey) (a), the residual (b) 

9.6. Summary 

This chapter provides a survey of the most important neural network archi- 
tectures and possibilities of their application in the fault diagnosis of dynamic 
non-lihear systems. The first stage of fault diagnosis is to generate the resid- 
ual signal. In order to perform this operation, artificial neural networks with 
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dynamic characteristics can be used. Neural networks can be seen here as an 
alternative solution to the classical methods such as Luenberger observers or 
Kalman filters. Moreover, neural networks can model any non-linear process 
with arbitrary accuracy. The second stage of fault diagnosis is residual eval- 
uation. Artificial neural networks are used here as fault classifiers. A neural 
network examines a possible fault or abnormal features in system outputs 
and gives a fault classification signal to declare whether the system is faulty 
or not. In an easy way one can design a fault classifier using the self-learning 
ability of neural networks based on a set of training patterns. The included 
examples show that by using artificial neural networks one can design effec- 
tive fault diagnosis systems. Neural networks fulfill their tasks with pretty 
high accuracy with both simulated and real process data. 
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Chapter 10 



PARAMETRIC AND NEURAL NETWORK 
WIENER AND HAMMERSTEIN MODELS 
IN FAULT DETECTION AND ISOLATION^ 



Andrzej JANCZAK* 



10.1. Introduction 

In the last two decades, model-based fault detection and isolation (FDI) has 
been investigated intensively (Frank, 1990; Chen and Patton, 1999; Patton 
et a/., 1989). These methods require both a nominal model of the system 
considered, i.e., a model of the system under its normal operating conditions, 
and models of the system under its faulty conditions. The nominal model is 
used in the fault detection step to generate residuals, defined as a difference 
between the output signals of the system and its model. The analysis of these 
residuals gives an answer to the question whether a fault occurs or not. If it 
does occur, the fault isolation step is performed in a similar way analyzing 
residual sequences generated with the models of the system under its faulty 
conditions (Fig. 10.1). 

In the case of complex industrial systems, e.g., a five-stage sugar evapora- 
tion station, the above procedure can be even more useful if it is applied not 
only to the overall system but also to its chosen sub-modules. Designing an 
FDI system with such an approach may result in both high fault detection 
sensitivity and high fault isolation reliability. 
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Fig. 10.1. FDI with the nominal model and a bank 
of models of system under faulty conditions 



The chapter starts with definitions of Wiener and Hammerstein models, 
a short survey of their applications to both industrial and biological systems, 
and a survey of the known identification methods including correlation meth- 
ods, parametric regression methods, non-parametric regression methods, gra- 
dient optimization methods, and the neural network approach. The structures 
of two basic neural network Wiener and Hammerstein models, i.e., feedfor- 
ward series-parallel models and recurrent parallel models, are introduced in 
the following part of the chapter. Then, the modelling of a residual generation 
process with parametric and neural network models of residual generators is 
considered. Neural network residual generators can be also applied to systems 
with steady-state characteristics that cannot be approximated by polynomi- 
als. The last part of this chapter presents the identification of two models 
of vapour pressure dynamics in a five-stage sugar evaporation station. These 
models, i.e., a linear model and a neural network Wiener model, are identified 
based on real system data recorded at the Lublin Sugar Factory. 



10.2. Wiener and Hammerstein models 

Wiener and Hammerstein models are used to describe non-linear dynamic 
systems with separated linear dynamic and non-linear static properties. In 
other words, Wiener and Hammerstein models are two examples of simple 
non-linear structures composed of a linear dynamic system in cascade with 
a static non-linear element. While in the Wiener model the linear dynamic 
system precedes the static non-linear element, in the Hammerstein model the 
same blocks are connected in reverse order (Fig. 10.2). Clearly, the assump- 
tion that the dynamic and static properties of a system can be separated is 
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(a) 




(b) 



Fig. 10.2. (a) Wiener system, (b) Hammerstein system 



a restrictive one. In spite of this, both these models represent the most fre- 
quently applied models of non-linear dynamic systems. This stems from the 
fact that the Hammerstein model can characterize adequately control systems 
with dominating non-linear properties of system actuators while the Wiener 
model is rather suitable for systems with non-linear sensors. Non-linear sys- 
tems that can be modeled as Wiener and Hammerstein models are widely 
encountered in different areas including industry, biology, sociology, and psy- 
chology. The pH neutralization process (Kalafatis et a/., 1997; Nie and Lee, 
1998) and the chromatographic separation process (Visala et al, 2000) are 
two well-known industrial examples of Wiener systems. Other industrial ex- 
amples of Wiener systems include systems with non-linear sensors (Wigren, 
1994), extremum control systems, fluid flow control (Wigren, 1994), elec- 
trical resistance furnaces (Skoczowski, 1998). The Hammerstein model can 
describe well systems with non-linear actuators (Zi-Qiang, 1994). Among in- 
dustrial examples of Hammerstein systems there are a distillation column, a 
heat exchanger (Eskinat et a/., 1991), a continuous copolymerization process 
(Su and McAvoy, 1993), and a sugar evaporator (Janczak, 2000a). Some bio- 
logical processes can be also considered as Wiener or Hammerstein systems, 
including a muscle relaxation process (Drewelow et aZ., 1997); see Hunter 
and Korenberg (1986) for more biological examples. An obvious advantage of 
Wiener and Hammerstein models with invertible non-linear characteristics is 
that their non-linear properties can be corrected easily with series static cor- 
rection elements. Therefore, Wiener and Hammerstein models are very useful 
in the engineering practice, particularly in the controller design problem. An- 
other important advantage of these models is that the stability of both the 
models is determined exclusively by their linear parts, and it can be easily 
examined. 

Let /(•) denote the steady-state characteristic of the non-linear element, 
ui, . . . , Qna, bi,. . . ,bnb the parameters of the linear dynamic system, and 
q~^ the backward shift operator. The output yi of a discrete Wiener system 
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to the input Ui at the discrete time i is 



Vi = f 






Ui , 



( 10 . 1 ) 



where 



ciiq ^ + • • • + ^^5 

B{q~^) = biq~^ + • • • + 6„6g“”'’- 



For a discrete Hammerstein system, the output yi to the input Ui is 



Vi = 



B{q-^) 

A{q-^) 



f{ui) 



( 10 . 2 ) 



10.3. Identification of Wiener and Hammerstein systems 

The identification of Wiener and Hammerstein systems has been investigated 
intensively for the past few decades. Several methods that have been devel- 
oped can be divided into the following four classes. 

A. Correlation methods 

These methods are based on the theory of separable processes (Billings and 
Fakhouri, 1978a; 1978b; 1982). Correlation analysis makes it is possible to 
separate the identification of the linear dynamic system and the identification 
of the non-linear element using a white Gaussian noise input with a non-zero 
mean (Billings and Fakhouri, 1978a; 1978b). Moreover, the relationship be- 
tween the first and second order correlation functions provides information 
regarding the system structure. For both Wiener and Hammerstein systems, 
the first order correlation function is directly proportional to the linear system 
impulse response. Then, the linear system impulse response can be param- 
eterized using the well-known linear regression methods. Testing the second 
order correlation function provides valuable information about the system 
structure. If the second order correlation function is equal to the first order 
correlation function, except for a constant of proportionality, the system is 
of the Hammerstein type. For Wiener systems, the second order correlation 
function is the square of the first order correlation function, except for a 
constant of proportionality. With the linear dynamic system identified, the 
static non-linear element can be identified easily, e.g., assuming that it can 
be expressed as a polynomial of a finite and known order and using linear 
regression techniques. 
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B. Linear regression methods 

The linear regression solutions are based on the restrictive assumption that 
the static non-linear element can be represented by a series expansion, com- 
monly a polynomial, of a finite and known order. For Hammerstein mod- 
els, Narendra and Gallman (1966), and Chang and Luus (1971) developed 
least-squares identification algorithms. The method by Haist et al (1973) for 
Hammerstein systems with correlated noise is based on the identification of 
the Hammerstein model as well as the noise model. Other approaches of this 
type assume that the system steady-state characteristic can be approximated 
by Taylor series expansions (Chung and Sun, 1988), and block pulse function 
expansions (Kung and Shih, 1986). Similar algorithms can be used to identify 
an inverse Wiener model but this requires the static non-linear element to be 
invertible. Moreover, the application of output error type methods requires 
the linear dynamic model to be minimum phase, a condition that is difficult 
to be guaranteed in the discrete time case. The least squares method was used 
by Kalafatis et al (1995) with a frequency response filter model describing 
the linear part and a polynomial model of the inverse non-linear element. 
Pearson and Pottman (2000) proposed the weighted least squares approach 
for the identification of pulse transfer function parameters on the assumption 
that the inverse non-linear characteristic is known. Introducing a modified 
definition of the identification error, a least squares approach was proposed 
(Janczak, 2001) for both the identification of the pulse transfer function and 
a polynomial model of the inverse non-linear element. A similar idea was used 
by Marciak et al (2001), in a method that uses orthonormal basis functions 
for modelling a linear dynamic system. 

C. N on-parametric regression methods 

The class of Wiener and Hammerstein systems identified using the corre- 
lation or linear regression methods is restricted by the assumption that the 
non-linear function /(•) of the static non-linear element is continuous and can 
be expressed as a polynomial of a finite and known order r. This assumption 
is no longer necessary if we use non-parametric identification methods, which 
allow one to identify a larger class of non-linear characteristics including all 
bounded, or square integrable or Lipschitz functions. Non-parametric meth- 
ods do not require extensive knowledge about the non-linear element and the 
non-linear characteristic is treated as a regression function (Creblicki and 
Pawlak, 1986; 1987). Other approaches use non-parametric procedures for 
the estimation of the parameters of finite power series (Lang, 1994; 1997), 
Legendre polynomials (Pawlak, 1991), and orthogonal series (Creblicki, 1989; 
1994). 

D. Non-linear optimization methods 

In these methods, models are expressed as non-linear in parameters. Model 
parameters are estimated, usually with gradient optimization techniques, to 
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minimize a chosen error function. An example of such an approach to the iden- 
tification of Hammerstein models is the method that uses a Laguerre function 
expansion of a finite order to represent the linear dynamic system and a poly- 
nomial model of the non-linear element (Thathachar and Ramaswamy, 1973). 
Prediction error methods proposed by Wigren (1993; 1994) for Wiener sys- 
tems can be also included in this group. Neural network approaches to the 
identification of Wiener and Hammerstein models can be included in this 
group of methods as well. In neural network Wiener and Hammerstein mod- 
els, a neural network of the multilayer perceptron architecture is used as a 
model of the non-linear element and a single linear neuron is used as a model 
of the linear dynamic system (Al-Duwaish et al.^ 1997; Janczak, 1995; 1996; 
Korbicz and Janczak, 1996; Su and McAvoy, 1993). Series-parallel models, 
which do not contain any feedback connections, can be adjusted with the 
computationally effective backpropagation algorithm. Similarly as for linear 
systems, identification with series-parallel models of Wiener or Hammerstein 
systems with an additive output white noise results in correlated residu- 
als. More useful in such a situation are parallel Wiener and Hammerstein 
models, which contain feedback connections since they are dynamic systems 
themselves. Unfortunately, gradient calculation in dynamic neural networks 
requires more computationally intensive methods, known as the sensitiv- 
ity method or backpropagation through time (Janczak, 1997a; 1997b). Both 
Wiener and Hammerstein models contain a linear dynamic system. Thus, it 
is also possible to combine the computationally effective least squares method 
with the gradient descent algorithm (Al-Duwaish et a/., 1997; Janczak, 1998a; 
1998b). 

10.4. Parametric and neural network Wiener 
and Hammerstein models 

Two basic configurations of Wiener and Hammerstein models are known as 
series-parallel and parallel models. In series-parallel models, the model output 
depends on the past values of the system input ui-j^ j = 1,2, .. . ,nb, and 
the past values of the system output yi-j^ j = l,2,...,na. Only a one 
step ahead prediction of the system output can be made with series-parallel 
models as the computation of the model output at time i requires the system 
output in the previous time instances. Parallel models, on the other hand, 
do not use the past system outputs at all as their output is computed based 
on the past values of the system input Ui-j, j = 1,2, . . . ,n6 and the past 
values of the model output yi-j, j = 1,2, ...,na. Therefore, a multi step 
ahead prediction of the system output, or a simulation of system behaviour, 
is possible with parallel models. Esimating parameters of a model, based on 
input-output measurements, a chosen error function is minimized. The error 
defined for series-parallel models is known as the equation error. For parallel 
models, the identification error is called the output error (Ljung, 1999). 
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At the time i, the model output of the series-parallel Hammerstein 
model is 

yi=[l- A{q~'^)\yi + B{q~'^)f{ui), ( 10 . 3 ) 

with 

= 1 -f- aiq ^ -h • • • 4- anaQ 
B{q~^) = biq~^ + • ■ • + hnbq~'^'’ , 

where /(•) in (10.3) is a non-linear function describing the non-linear ele- 
ment, and ai, . . . , bi,. . . ,bnb are the parameters of the linear dynamic 
model. 

The output yi of the series-parallel Wiener model at the time i is 

Vi = f{si), ( 10 . 4 ) 

with 

Si = [1 - A(q~^)]f~^{yi) + B{q~Aui, ( 10 . 5 ) 

where the function /“^(-) is a characteristic of the inverse model of the 
non-linear element. 

For the parallel Hammerstein model, the model output yi is given by the 
following expression: 

yi=[l- A{q~^)]yi + B(g"^)/(ui). (10.6) 

In the parallel Wiener model, there is no inverse model of the non-linear 
element. Therefore, models of this kind can be used for the modelling of not 
only Wiener systems with invertible non-linearities but also those with non- 
invertible static non-linearities. Expressions describing the parallel Wiener 
model have the form 

Vi - M), (10.7) 

Si = [l - A{q~'^)]si + B{q~'^)ui. (10.8) 



10.4.1. Parametric models 

Assume that the non-linear function /(•) of a Hammerstein system can 
be approximated with a polynomial of the order r and the parameters 

f{ui) =70+ liUi + • • • + 7r<. (10.9) 

Then the one step ahead prediction of the system output (10.3) can be written 
in the linear-in-parameters form. For Wiener systems, the model output can 
be expressed as linear in parameters if the model is defined in a modified 
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series-parallel form with the inverse non-linear element model described by a 
polynomial with the parameters Aq, Ai , . . . , A^: 

/ ^(2/i) == ^0 + -f •••-}- Ar2/[- (10.10) 

Linear-in-parameters forms of a model output make it possible to employ 
the least squares method for parameter estimation. Taking into considera- 
tion (10.9) in the parallel Hammerstein model (10.6) or (10.10) in both the 
series-parallel (10.4), (10.5) and the parallel Wiener model (10.7), (10.8) leads 
to models non-linear in parameters, which require non-linear optimization 
methods for parameter estimation. 



10.4.2. Neural network models 



Multilayer perceptrons with at least one hidden layer, due to their experi- 
mental proven approximation properties, belong to the most widely applied 
neural network architectures. Neural networks of this type have a property 
of universal approximators, i.e., they are able to approximate any continu- 
ous function to an arbitrary degree of accuracy provided that the number of 
non-linear nodes is sufficiently large. Wiener and Hammerstein systems can 
be modelled using neural networks of a suitably specified architecture. As 
multilayer perceptrons with one hidden layer are most common, we assume 
that the model of the non-linear element is a multilayer perceptron that con- 
tains one hidden layer of M non-linear processing elements of the non-linear 
activation function (/>(•): 



M 

f{ui,Wf) = '^W2M+l4>{Xi,l) +W3M+1, (10.11) 

1=1 



with 



Xi^i - wiUi -h wm+u (10.12) 

where w/ = [wi , . . . , wi,. . . , W 2 m are the weights of the hidden 

layer nodes, and W 2 m-\-i, • • • , are the weights of the output node. The 

linear part of both systems is modelled with a single linear node with two 
tap delay lines. We assume also that a multilayer perceptron model of the 
inverse non-linear element has the same architecture containing one hidden 
layer of non-linear M neurons. Thus, the output of the inverse non-linear 
element model can be expressed as 



M 

/“^(2/i.V/) = '^V2M+l(t>{Zi,l) + V3M+1, (10.13) 

i=l 



with 

Zi,i = viyi-i-VM-^h (10.14) 

where v/ == [t^i, . . . , • • • , '^ 2 M denote the weights of the hidden 

layer nodes, and are the weights of the output neuron. 
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10.5. Fault detection. Estimating parameter changes 

The problem considered here can be stated as follows: Given the system input 
and output sequences and knowing the nominal models of a Wiener or Ham- 
merstein system, generate a sequence of residuals, and process this sequence 
to detect and isolate all changes of system parameters caused by any system 
fault. Both abrupt (step-like) and incipient (slowly developing) faults are to 
be considered as well. Assume that the nominal models of the Wiener or Ham- 
merstein system defined by the non-linear function /(•) and the polynomials 
A{q~^) and B{q~^) are known. These models describe the systems under 
their normal operating conditions with no malfunctions (faults). Moreover, 
assume that at the time k a step-like fault occurred, and caused a change in 
the mathematical model of the system. This change can be expressed in terms 
of the additive components of pulse transfer function polynomials. The pulse 
transfer function polynomials A{q~^) and B{q~^) under faulty conditions 
are 

A(q-^) = A[q-^) + AA(q-'^), (10.15) 

B{q-^)=B(q-^)+AB{q-^), (10.16) 

where 

AA(q“^) = aiq~^ H h anaq~‘^°‘, (10.17) 

AB{q-^) = + • • • + Pnbq~^^ ■ (10.18) 

The characteristic of the static non-linear element g{') under faulty condi- 
tions can be expressed as a sum of /(•) and its change A/(-): 



g{ui) = f(ui) + Af{ui), (10.19) 

where 

^f{ui) =rio+ mu'l + mUi H . (10.20) 

For Wiener systems, it is assumed that both /(•) and g{') are invertible. 
The inverse function under faulty conditions can be written as a sum of the 
inverse function /“^(-) and its change A/“^(-): 

9~^(yi) = f~\vi) + A/“^(2/i), (10.21) 

where 

A/~^(2/z) = H • (10.22) 

Note that A/“^(-) here does not denote the inverse of A/(*) but only a 
change in the inverse non-linear characteristic. 
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10.5.1. Definitions of the identification error 

The residual e* is defined as a difference between the output of the system 
Hi and the output of its nominal model yi: 

ei = Vi- Vi- (10.23) 

Both series-parallel and parallel models can be used for residual generation. 
With the nominal series-parallel Hammerstein model (Fig. 10.3) given by the 
following expression: 

y* = [l - A{q^^)]yi + B{q~'^)f{ui), (10.24) 

we have 

6i = A{q~^)yi - B{q~^)f{ui). (10.25) 



HAMMERSTEIN SYSTEM 




9(Ui) 


Vi 


Biq-^) 








Aiq-^) 








1 



NOMINAL MODEL 



/(«.) 



5(g-‘) 







Fig. 10.3. Generation of residuals using the series-parallel Hammerstein model 



The output of the Hammerstein system under faulty conditions is 



Vi = 



A{q-^) 



g{ui) 



B{q-^) + AB{q-^) 
A{q-^) + AA(q-^) 



[f{ui) + Af{ui) . 



Thus, writing (10.26) in the form 

yi= [1- A{q-^) - AA{q-^)]yi 



(10.26) 



+ [B{q ^) + AB{q ^)] [/(uj) + A/(uj)] (10.27) 

and substituting (10.24) and (10.27) into (10.23), a residual equation ex- 
pressed in terms of changes of the polynomials A{q~^), and the 

function f{ui) is obtained: 



ei = -AA{q ^)yi + AB{q ^)f{ui)+[B{q ^) + AB{q ^)]A/(u*). (10.28) 
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Similar deliberations for the parallel Hammerstein model (Fig. 10.4) result 
in the following expression for e^: 



A{q-^)AB{q-^) - B{q-^)AA{q~^) 
A{q-^)[A{q-^) + AA{q-^)] 



+ 



B{q-^) + AB{q-^) 
A{q-^) + AA{q-^) 



Af{ui). 



(10.29) 




Fig. 10.4. Generation of residuals using the parallel Hammerstein model 



Residual generation for Wiener systems with the series-parallel Wiener 
model, given by (10.4) and (10.5), is more complicated as both the model of 
the non-linear element and the model of the inverse non-linear element are 
used. A simpler form of the series-parallel model can be obtained introducing 
the following modified definition of the identification error (Janczak, 1999): 

= r^iVi) - (10.30) 

The output of the linear dynamic system under normal operation conditions 
is 

rHvi) = ( 10 - 31 ) 

The equation (10.31) can be written in the form 

= [1 - + B{q-^)ui. (10.32) 

Therefore, the modified series-parallel model can be defined as follows: 

f~^(yi) = [1 - M<l^^)]f~^iyi) + B{q-^)ui. (10.33) 

Residual generation in a scheme with the modified series-parallel Wiener 

model is illustrated in Fig. 10.6. The signal for the Wiener system 
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Fig. 10.5. Generation of residuals using the series-parallel Wiener model 




Fig. 10.6. Generation of residuals using a modified series-parallel Wiener model 



under its faulty conditions can be expressed as 

r'-iVi) = [1 - [f~\vi) + Af-\yi)] 

+ [B(q-^) + AB{q-^)]ui - Af-\yi). (10.34) 

Hence, the following expression for the error (10.30) is obtained: 
a = -AA{q~^)f~'^{yi) + AB{q^'^)ui 

- [A(q-^) + AA{q-^)]ArHyi)- (10.35) 

An important advantage of both the series-parallel Hammerstein model 
and the modified series-parallel Wiener model is that their residual equa- 
tions (10.28) and (10.35), after a suitable change of parameterization, can be 
written in linear-in-parameters forms. Then, these redefined parameters can 
be estimated with linear regression methods. 
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10.5.2. Hammerstein system. Parameter estimation 
of the residual equation 

Assume that the function A/(*), describing a change of the steady-state 
characteristic /(•) caused by a fault, has the form of a polynomial of the 
order r: 

Af{ui) =rio-\- ri2ul H h r]ru\. (10.36) 

Then the residual equation (10.28) can be written in the following linear-in- 
parameters form: 

na nh r nh 

Ci — — ajyi-j 4" /3jf{ui-j) + do + (10.37) 

j=l j=l k=2 j=l 

where 

nb 

do = mY.{bj+l3j), (10.38) 

H” Pj)' (10.39) 

Note that (10.37) has M — na -h -h nb{r — 1) -f 1 unknown parameters 
aj, dkj and do. In a disturbance-free case, these parameters can be 
calculated performing N = M measurements of the input and output signals 
and solving the following set of linear equations: 



na nb r nb 

e* = - + do + YY^ dkju^_j, 

j=l j=l k=2 j=l 



z==0,l,...,M-l. 



(10.40) 



In practice, it is more realistic to assume that the system output is dis- 
turbed by the additive output disturbance If e* has the form 



1 

where is a zero mean white noise, then 



ei = -AA{q ^)yi-\-AB{q ^)f{ui) 

+ [^(^ AB{q ^)]Af{ui)+£i. 



(10.41) 



(10.42) 



Performing N ^ M measurements of the input and output signals, the 
parameters of the model (10.37) can be estimated using the least squares 
method with the parameter vector 6i and the regression vector defined 
as follows: 
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Xi 



[ Vi—l ■ ■ ■ Vi—na f • • • f {'^i—nb) 1 

. . . w^n6 • • • . . . U^i-nbf ■ 



The vector Oi can be calculated on-line with the recursive least squares 
method: 



6i = 0i-i +PjXi(ei -xf0j_i), 

Pi-lXjxfPi-i 



p« = pj-1 



1 + xfPj-iXi 



(10.43) 

(10.44) 



with ^0 = 0 and Pq = al, where a ^ 1, and I is an identity matrix. The 
parameter estimation of the model (10.37) often results in asymptotically 
biased estimates. That is the case if the system additive output disturbances 
are not given by (10.41) but have the property of a zero mean white noise, 
i.e., €i = Si. In this case, the residual equation (10.28) takes the form 



ei = -AA{q ^)yi + AB{q ^)f{ui)-{-[B{q + AB{q ^)]Af{ui) 

-i- \A{q + AA{q ^)]6i. (10.45) 

The term [A{q~^) -f AA{q~^)]si in (10.45) makes the parameter estimates 
asymptotically biased and the residuals ei — xfOi-i correlated. To obtain 
unbiased parameter estimates, other known parameter estimation methods, 
such as the instrumental variables method, the generalized least squares 
method, or the extended least squares method, can be employed (Eykhoff, 
1980; Soderstrdm and Stoica 1994). Using the extended least squares method, 
the vectors Oi and are defined as follows: 

Oi — [di . . .Ajia Pi . . • Pnb do <^2,1 • • • ^2,716 • • • • . • dr^nb Ci . . . Cfia\ , 

Xi = [ - yi-i yi-na f{Ui-l) . . . f{Ui-nb) 1 

Ui_i . . . Ui_^j^ . . . Ui_i . . . ^i—l • • • ^i—na\ 7 



where denotes the one-step-ahead prediction error of e^: 

ei = 6i- xJOi-i. (10.46) 



Example 10.1. A nominal Hammerstein model composed of the linear dy- 
namic system 



B(q-^) _ 0.5g-i - 0.3g-2 

A{q-^) ~ l-1.5g-i+0.7^-2 



(10.47) 



and of the non-linear element 



f{ui) - tanh(ui) 



(10.48) 
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was used in the simulation example. The Hammerstein system under its faulty 
conditions is described as 



- 0 . 2^-2 

A{q-^) ~ 1 - 1.75g-i -f 0.85g-2 ’ 



(10.49) 



g{ui) = tanh(wi) — 0.25t/? — 0.2u? + 0.15ix^ — O.lu^ 

+ O.Obul - 0.025uJ + 0.0125w^ (10.50) 

The steady-state characteristics of both the nominal model and the Hammer- 
stein system under its faulty conditions are shown in Fig. 10.7. The system 
input was excited with a pseudorandom sequence of a uniform distribution 
in the interval (—1, 1). The system output was disturbed by the additive 
pseudorandom disturbance 



II 

1 


(10.51) 




(10.52) 



where Si are the values of a pseudorandom sequence of a uniform distribu- 
tion in the interval (—0.005, 0.005). For the system disturbed by (10.51), 
the parameters of (10. 40) were estimated with the Recursive Least Squares 
(RLSi) algorithm. In the case of the disturbance (10.52), parameter estima- 
tion was performed using both the RLSj and the Recursive Extended Least 
Squares (RELSjj) algorithm. The obtained parameter estimates are given 
in Table 10.1. The identification error of Af{ui) is shown in Fig. 10.8. A 
comparison of estimation accuracy defined by four different indices defin- 
ing the estimation accuracy of the residuals Ci, function Af{ui), parameters 
T]j, j = 2, . . . ,S, and aj and bj, j = 1, 2, is given in Table 10.2. The high- 
est accuracy is obtained using the RLSj algorithm for the system disturbed 
by (10.52). For the system disturbed by (10.52), a comparison of the parame- 
ter estimates and indices confirms the asymptotical biasing of the parameter 
estimates obtained with RLSji. 

The above parameter estimation approach can be used if the function 
A/(-) has the form of the polynomial (10.36). For A/(-) in the form of 
power series, changes of linear dynamic system parameters and a change of 
the steady-state characteristic can be estimated with a neural network model 
of (10.28), called here the neural network residual generator model: 

ii = -AA{q~^)yi + AB{q~Af{ui) + [B{q~A + ^B{q~A]Af{ui). (10.53) 

The neural network residual generator model is composed of a multilayer 
perceptron, modelling the function A/(-), and a linear node with three tap 
delay lines (Janczak and Korbicz, 1999). The model inputs are f{ui), 
and Pi (Fig. 10.9). As the neural network residual generator model is a 
feedforward neural network, its training can be performed with the well- 
known backpropagation algorithm. 
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Fig. 10.7. Hammerstein system. Non-linear functions f{ui) and g{ui) 




Fig. 10.8. Identification error of the function Af{ui) 

10.5.3. Wiener system. Parameter estimation of the residual equation 

Assume that the function A/“^(-), describing a change of the inverse steady- 
state characteristic caused by a fault, has the form of a polynomial of the 
order r: 



A/ ^ (yi) = Mo + M2i/i + • • • + y.rVi ■ 



(10.54) 
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Table 10.1. Parameter estimates 




Table 10.2. Comparison of estimation accuracy 



Index 


RLSi 

_ 


RLSii 


RELSii 

Si = Si 




n 

1 


Ci — Si 


1 M 


1.31 X 10"® 


4.51 X 10“® 


2.57 X 10' 


1 400 ^ o 

4^E[/W-/W] 


1.86 X 10“® 


6.54 X 10~'^ 


2.04 X 10' 


7 j=2 


1.63 X 10“® 


5.43 X 10~^ 


2.41 X 10“ 


7 S [i^j ~ "I” ~ ^j)^] 


5.52 X 10“® 


9.58 X 10“® 


1.77 X 10' 
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Fig. 10.9. Neural network model of the residual generator 



Similarly as in the case of the Hammer stein system, the residual equa- 
tion (10.35) can be expressed in the linear-in-parameters form: 

na nb r na 

= - XI + X + ^0 + X X ( 10 - 55 ) 

i=i j=i k=2 j=0 

where 

na 

do - “Mo [l + 4- aj)] , (10.56) 

j=i 



dkj — 



-fj^k, j = 0 

-H>k{cij j = l,...,na. 



(10.57) 



For disturbance-free Wiener systems, the parameters aj, dkj and 
do can be calculated solving a set of M = na + nb-\- (na -fl) (r — 1) -1- 1 linear 
equations. In the stochastic case, parameter estimation with the parameter 
vector Oi and the regression vector defined as 

— [^1- • *^no Pi ’ Pnb do ^2 ,0 • • • ^2,na • • • ^r,0 • • • ^r,na] j 

Xi = [-/~^(2/i_l) f~^{yi-na) Ui-i...Ui-nb 1 



Vi ' Vi—na ' ’ ' Vi * * * Vi—naj 

results in asymptotically biased parameter estimates. This comes from the 
well-known property of the least squares method stating that to obtain con- 
sistent parameter estimates, the regressor vector should be be uncor- 
related with the system disturbance Si, i.e., E[xi6i] = 0. Obviously, this 
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condition is not fulfilled for the model (10.55) as the powered system out- 
puts 2 /?, . . . , depend on si. A common way to obtain consistent pa- 

rameter estimates in such cases is the Instrumental Variable (IV) method 
or its Recursive (RIV) version. Instrumental variables should be chosen to 
be correlated with regressors and uncorrelated with the system disturbance 
£i. Although different choices of instrumental variables can be made, replac- 
ing Vi , • • • -iVi-na their approximated values obtained by filtering Ui 

through a system model is a good choice. This leads to the following estima- 
tion procedure: 

i) estimate the parameters of (10.55) with the LS or RLS algorithm; 

ii) simulate the model (10.55) with the obtained parameters to obtain 

iii) estimate the parameters of (10.55) using the IV or RIV method with 
the instrumental variables 

Zi = [- f~^{yi-na) Ui -1 . ..Ui-nb 1 

-2 -2 -r -r iT 

Vi ‘-yi-na^-Vi •••2/i-naJ • 

An alternative to the above procedure can be parameter estimation with 
the extended least squares algorithm. In this case the parameter vector Gi 
and the regression vector are defined as 

Gi = [di . . •dna /^1 • • • Pnh ^0 <^2,0 •• • ^2,na • • . dr^O . . . dr,na Ci . . . Cfia^ , 

Xi = [- f~^{yi-na) Ui-i... Ui-nb 1 

Vi ‘ Vi—na ' ’ 'Vi • * * Vi—na 1 * * • ^i—na\ ? 
where denotes the one-step-ahead prediction error of e^: 

ei = 6i- xj6i-i. (10.58) 



Example 10.2. The Wiener model used in the simulation example consists of 
the linear dynamic system (10.47) and the non-linear element of the following 
inverse non-linear characteristic: 

=yi-^vl- (10.59) 

The Wiener system under its faulty conditions is defined by the pulse transfer 
function (10.49) and the inverse non-linear characteristic (Fig. 10.10) of the 
form 



g ^{yi) =ta.n{yi). 



(10.60) 
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Fig. 10.10. Wiener system. Inverse non-linear functions / ^{yi) and g ^{yi) 




Fig. 10.11. Identification error of the inverse non-linear function A/ 

The system was excited with a sequence of 50000 pseudorandom values of a 
uniform distribution in (—1, 1). The system output was disturbed additively 
with the disturbances (10.51) and (10.52), with {e^} being a pseudorandom 
sequence, uniformly distributed in (—0.01, 0.01). Parameter estimation was 
performed with the RLSj algorithm for the disturbance-free case, and the 
RLSn and RELSn algorithm for the system disturbed by (10.52). The pa- 
rameter estimates are given in Table 10. 4, and Table 10.3 shows a compari- 
son of estimation accuracy. The identification error of the function A/“^(-) 
is shown in Fig. 10.11. The assumed finite order of the polynomial (10.54), 
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i.e., r — 11, is another source of the identification error as the function 
A/“^(-) can be represented accurately with the power series. Similarly as in 
Example 10.1, the lower estimation accuracy of the results obtained with the 
RLSii algorithm in comparison with the results obtained with the RELSji 
algorithm shows the inconsistency of the RLSji estimator. 



10.6. Five-stage sugar evaporator. Identification of the 
nominal model of steam pressure dynamics 

The main task of a multiple-effect evaporator is thickening thin sugar juice 
from the sugar density of approximately 14 to 65-70 Brix units (Bx). Other 
important tasks are steam generation, waste steam condensation, and sup- 
plying waste-heat boilers with water condensate. In a multiple-effect evapora- 
tion process, the sugar juice and the saturated steam are fed to the successive 
stages of the evaporator with gradually decreasing pressure. Many complex 
physical phenomena and chemical reactions occur during the thickening of 
sugar juice, including sucrose decomposition, calcium compounds precipita- 
tion, acid amides decomposition, etc. A flow of the steam and sugar juice 
through the successive stages of the evaporator results in a close relationship 
between temperatures and pressures of the juice steam in these stages. More- 
over, the juice steam temperature and the juice steam pressure depend also 
on other physical quantities such as the juice rate of the flow or the juice 
temperature. 

10.6.1. Theoretical model 

Theoretical approach to process modelling is based on deriving a mathemat- 
ical model of the process by applying the laws of physics. Theoretical models 
obtained in this way are often too complicated to be used, for example, in con- 
trol applications. In spite of this, theoretical models are a source of valuable 
information on the nature of the process. The information can be very useful 
for the identification of experimental mathematical models based on process 
input-output data. Moreover, theoretical models can serve as benchmarks for 
the evaluation and verification of the experimental models. 

Modelling the multiple-effect sugar evaporator is complicated by the fact 
that it consists of a number of evaporators connected both in series and in 
parallel. All of this makes the formulation of a theoretical model difficult. 
Theoretical models are derived based on mass and energy balances for all 
stages. To perform this, the following assumptions have to be made (Lissane 
Elhaq et al, 1999). 

i) The juice and the steam are in saturated equilibrium. 

ii) Both the mass of the steam in the juice chamber and the mass of the 
steam in the steam chamber are constant. 
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iii) The operation of juice level controllers makes it possible to neglect 
variations of juice levels. 



iv) Heat losses to the surrounding environment are negligible. 

v) Mixing is perfect. 



The model of the dependence of the steam pressure in the the steam 
chamber of the stage k on the the steam pressure in the steam chamber of 
the stage (A; — 1), given by Carlos and Corripio (1985), has the form 



Pk = Pk-i - 



IkjOk- 

pliTk-u 



-1? 



k-l) 



Pi - Po - 






(10.61) 



(10.62) 



pliTo.PoY 

where Pk denotes the juice steam pressure in the steam chamber of the stage 
k, jk is the conversion factor, Ok is the steam flow rate from the stage k, 
Pk is the steam density at the stage k, and Tk is the steam temperature at 
the stage k. 

The model (10.61) and (10.62) describes steady-state behaviour of the 
process. To include the dynamic properties of the process, a modifled model, 
described with differential equations of the first order, was proposed: 



_ dPfc_i 

Pk — '^k 1 , P Pk- 

dt 



IkiOk-i)^ 



(10.63) 



Pi 






P?(2o,Po)’ 



(10.64) 



where Tk is the time constant. 

The equations (10.61)-(10.64) are part of the overall model of the sucrose 
juice concentration process. This model comprises also the mass and energy 
balances for all stages (Lissane Elhaq et al, 1999). 



10.6.2. Experimental models 

In a neural network model, it is assumed that the model output p 2 ^i is a sum 
of two components defined by the non-linear function of the pressure 
and the linear function of the pressure Pi^i. The model output at time i is 



^2.* = +C{q-^)Pi,u 

U{q-^) J 


(10.65) 


A{q~^) = 1 + aiq~^ + d.2q~^, 


(10.66) 


P(g-i) = i)iq~^ + h2q~‘^, 


(10.67) 


C(g-i) = ciq~^ +&2q~‘^, 


(10.68) 
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where ai, cl 2 , h, 62, ci and C2 denote the model parameters. The function 
/(•) is modelled with a multilayer perceptron containing one hidden layer 
consisting of M nodes of the hyperbolic tangent activation function 

M 

/w=E rcjtanh (xj^i) + wsm+i , (10.69) 

where Si = [B{q~^) / A{q~^)]Ps^i is the output of a liner dynamic model, 
and Wmj m = 1, 2 , . . . 3M + 1 are the weights of the neural network. The 
activation of the j-th neuron at the time i is 

Xj^i - WM+jSi + W2M-\-j^ (10.70) 

The neural network structure (Fig. 10.12) contains a Wiener model 
(Janczak, 1997a; 1998a) in the path of the pressure and a linear fi- 
nite impulse response filter of the second order in the path of the pressure 
Pi^i. In a linear model of the steam pressure dynamics, the Wiener model is 
replaced with a linear dynamic system: 

(10.71) 




Fig. 10.12. Neural network model of steam pressure dynamics 



10.6.3. Estimation results 

The parameter estimation of the models (10.65) and (10.71) was performed 
based on a set of 10000 input-output measurements recorded at the sampling 
rate of 10s. The RLS algorithm was employed for the estimation of the ARX 
model. The neural network Wiener model of the structure J\f{l — 24 — 1) was 
trained recursively with the backpropagation learning algorithm (Janczak, 
2000b). For both models 50000 sequential steps were made, processing the 
overall data set five times. Then, the models were tested with another data 
set of 8000 input-output measurements. 

The obtained results of estimation and testing are shown in Fig. 10.13- 
10.16, and a comparison of the testing results for both models is given in 
Table 10.5. Lower values of the mean square prediction error of the pressure 
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Fig. 10.13. Steam pressure in the steam chamber 2 and 
the output of the neural network model - the training set 




Fig. 10.14. Steam pressure in the steam chamber 2 and 
the output of the neural network model - the testing set 

P 2 , obtained for the neural network model for both the training and testing 
sets, confirm the non-linear nature of the process. 

As the analysed model of steam pressure dynamics is characterized by 
a low time constant of approximately 20s, fast fault detection is possible. 
Moreover, as the steam pressure is not controlled automatically, there is no 
problem of closed-loop system identification. 




10. Parametric and neural network Wiener and Hammerstein models . . . 



407 



10.7. Summary 

The examined methods of estimating system parameter changes require 
a nominal model of the system, sequences of the system input and out- 
put signals, and the sequence of residuals. For disturbance-free Hammer- 
stein systems with non-linear characteristics described by polynomials, or 
disturbance-free Wiener system with inverse non-linear characteristics de- 
scribed by polynomials, the calculation of parameter changes can be per- 
formed solving a set of linear equations. The least squares method can be 
used for the parameter estimation of Hammerstein systems, disturbed ad- 
ditively with (10.41). For other types of output disturbance, asymptotically 
biased parameter estimates are obtained with the least squares method. In 
this case, consistent parameter estimates can be obtained using other param- 
eter estimation methods, e.g., the one extended least squares. 

The estimation of parameter changes of Wiener systems with the least 
squares method results in asymptotically biased parameter estimates. To ob- 
tain asymptotically unbiased parameter estimates, a two-step estimation pro- 
cedure has been proposed, which combines the least squares and instrumental 
variable methods. An alternative solution can be parameter estimation with 
methods that estimate the parameters of noise models, such as the extended 
least squares one. 

All these methods are useful for systems with non-linear characteristics 
or inverse non-linear characteristics described by polynomials. For systems 
with characteristics described by power series, the estimation of parameters of 
linear dynamic systems and changes of the non-linear characteristic for Ham- 
merstein systems or inverse non-linear characteristic for Wiener systems can 
be performed by identifying neural network models of the residual equation. 
System nominal models are identified based on input-output measurements 
recorded under normal operating conditions. In spite of the fact that such 
measurements are often available for many real processes, the identification 
of nominal models at a high level of accuracy is not an easy task, as it has 
been shown in the sugar evaporator example. The complex non-linear nature 
of the sugar evaporation process, disturbances of a high intensity and corre- 
lated inputs that do not fulfill the persistent excitation condition are among 
the reasons for the observed difficulties in achieving high accuracy of system 
modelling. 
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Chapter 11 



APPLICATION OF FUZZY LOGIC 
TO DIAGNOSTICS 

Jan Maciej KOSCIELNY*, Michal SYFERT 



11.1. Introduction 

The basics of fuzzy logic as well as fuzzy modelling and control are described, 
for example, in the monographies by Czogala and L^ski (2000), Yager and 
Filev (1994), Drinkov et al (1996), Rutkowska (2002), and Piegat (2001). 
An interesting overview of fuzzy logic application to fault detection and iso- 
lation can be found in Frank and Marcu (2000). This chapter presents the 
application of fuzzy logic to fault detection and isolation. 

A growing interest in the development and industrial applications of 
methods to the fuzzy modelling of systems has been observed during the last 
couple of years. It is visible in both the number of the papers published, 
as well as the number of software packages developed for such applications. 
Fuzzy models, similarly as neural ones, are suitable not only for the control, 
optimisation and estimation of variables that cannot be measured, but also 
for fault detection and isolation. A vital advantage of fuzzy techniques is the 
ability of modelling non-linear systems. Models of systems in the state of 
complete efficiency are obtained on the grounds of data from experiments, 
with the use of various training methods. A fuzzy system description in the 
form of a network (fuzzy neural network) makes the application of learning 
methods developed for neural networks possible. 

In automated industrial processes, both current process data and 
archived ones are attainable. This creates an advantageous situation for build- 
ing models on the grounds of process measured data as well as an expert’s 
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knowledge about the relationship which exists between the variables (i.e., 
the model structure). On the other hand, the rapid development of com- 
puter techniques has eliminated a vital barrier related to significant calcula- 
tion efforts required for tuning fuzzy model parameters with the use of large 
data sets. 

Fuzzy logic is a very efficient tool for the conversion of uncertain and in- 
accurate information. Most data obtained in industrial practice have such a 
character. Disturbances and measurement noise exist in the process. Measure- 
ment data contain errors. The models of the system are not quite accurate, so 
the calculated residual values are not precise. Uncertainties exist also when 
experts try to establish the diagnostic relation. Fuzzy logic is a natural way 
of taking these uncertainties into account, therefore it can be successfully 
applied to diagnosing algorithms for residual values evaluation, description 
of the relation between faults and symptoms, and for inference. 



11.2. Fault detection 

Fuzzy logic is applied to fault detection mainly through methods based on 
the use of models. The models make it possible to calculate process variable 
values. Signals calculated in this way are redundancies of measured signals. 
Redundancy which consists in comparing the measured signals with the cal- 
culated ones is called information redundancy. 

Different kinds of models can be applied, including fuzzy ones, which are 
an alternative to analytical ones. They allow us to describe the operation of 
a system in a way which is natural for the system operator, i.e., in the form 
of if-then rules. The basics of fuzzy modelling are described, for example, 
in (Babuska and Verbrugen, 1996; Czogala and Lqski, 2000; Piegat, 2001; 
Rutkowska, 2002; Wang and Mendel, 1992a; 1992b; Yager and Filev, 1994). 

Figure 11.1 presents a general diagram of residual generation with the use 
of a fuzzy model. A residual r is obtained as a result of a comparison between 
the model output yM and the real measured signal y. In the normal state, 
the residual value is close to zero, but it is different than zero when a fault 
appears in the controlled part of the system. It is assumed that the process 
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Fig. 11.1. Diagram of residual generation with the use of fuzzy models 
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of obtaining the models is carried out in the initial part of the operation 
of the diagnosed system. This process should also be repeated after each 
modification or repair of the system. 

Many fuzzy modelling techniques are known that can be applied to fault 
detection. The following ones will be presented in this chapter: 

• Wang and Mendel’s (WM) models and their modified versions, 

• fuzzy neural networks. 

The knowledge of an expert, e.g., an engineer or a process operator, on 
the grounds of which there are formulated rules that define the system op- 
eration, can be used for constructing the model. A direct approach to fuzzy 
model construction, however, has serious disadvantages. In the case when the 
expert’s knowledge is incomplete or erroneous, the obtained model may be 
incorrect. While constructing the model, one should also use measurement 
data. Large sets of process variable values are collected and archivised by 
Distributed Control Systems (DCS) or Supervisory Control And Data Ac- 
quisition (SCADA) systems. It is profitable to join the expert’s knowledge 
with the available measurement data during the creation of a fuzzy model. 
The expert’s knowledge is used for defining the structure as well as the ini- 
tial values of the model’s parameters (distribution of membership functions), 
while measurement data are used for model tuning. The basic information 
on fuzzy models is presented in Chapter 2.3.6. 

11.2.1. Wang and Mendel’s fuzzy models 

Wang and Mendel’s models are simple fuzzy models represented by a set of 
rules of the following form: 



Ri : If {xi = AhJ and {x2 = A2i^) and . . . and {xn - ^nin) 

then {y = (11.1) 

where Xk denotes the A:-th input, is the ij^-th partition of the fc-th input, 
y denotes the output and is the i^+i-th partition of the output. 

What is characteristic in these models is the way they are constructed 
as suggested by Wang and Mendel (1992a; 1992b). Wang and Mendel’s iden- 
tification method allows constructing a fuzzy model of the system based on 
measurement data which has a structure defined by an expert. 

11.2.1.1. Construction of fuzzy models using Wang 
and Mendel’s method 

The identification process is carried out in the following four phases: 

Phase 1 - Definition of linguistic variables and their division into regions 

At the beginning, the input signals as well as the output signal of the model 
should be defined. Measurement data which contain values of the input signals 
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and the output signal (xi, X 2 , . . . , Xn, 2/) obtained at n successive sampling 
moments are used for identification. The range of the change of each signal is 
divided into regions, to which fuzzy sets are attributed (i.e., linguistic terms 
(names) and membership functions). The division may be uniform or not, and 
the membership function may have different forms. The higher the number 
of regions, the higher the accuracy of modelling, but also the complexity of 
the model. Both static and dynamic models can be identified with the WM 
method. In the case of dynamic models, signal values that correspond to 
different sampling moments are brought to the model’s input. 

Phase 2 - Generation of rules on the grounds of an experiment 

A rule having the form (11.1) is created for each set of the inputs-output data. 
In the i-th rule there appear such partitions Aki^ and of linguistics 

variables for which the membership function value of a signal value is the 
highest: 

V Akik = Akj : m&x{^ikj{xk)}, (11.2) 

k 3 

= Bj : max {fJ.j(y)}, (11.3) 

3 

where {luji^k) denotes the coefficient of the membership of the k-th input 
to the j-th fuzzy set {j-th partition of the linguistic variable), while fij{y) 
is the coefficient of the membership of the output to the j-th fuzzy set. 

Phase 3 - Attribution of a weight to each one of the rules 
Since each set of data generates one rule and the number of experimental 
data is relatively high, it is very probable that rules that have more than one 
logical meaning, i.e., have identical predecessors but different consequents, are 
generated. In order to solve this problem, a weight wi is attributed to each 
one of the rules. The weight is calculated as the product of the membership 
coefficients fikiki^k) of the predecessors and the consequent of the 

rule: 

n 

Wi = (11.4) 

k=l 



Phase 4 - Creation of the rule base 

On the grounds of rules and their weights defined during Phases 2 and 3, 
the base of the knowledge of a fuzzy model is created. Only such rules which 
have the highest weights are chosen out of the contradictory ones. 

The rule base contains the relations that exist between the input and 
output variables and constitutes the system model. In a general case, the rule 
base has n dimensions (where n is the number of input variables). In the 
case of two input signals, the rule base has the form of a table (Fig. 11.2). The 
linguistic variable value that appears in the conclusion of a rule (the name of 
the fuzzy set for the output signal) is written down at the intersection of the 
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column that corresponds to the partition of the signal x± and the row that 
corresponds with the partition of the signal X 2 . 
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Fig. 11 . 2 . Example of a rule base. The rule “If (xi = Si) and (x2 = M2) then 
(y = M)” is indicated by the dark field, where M stands for medium, S - small, 
and L - large 

The base of rules allows evaluating the completeness and correctness of 
training data. It is possible to detect missing or false rules. Empty fields that 
exists in the base surrounded by rules obtained in the process of identification 
correspond to lacking rules. Such fields can be supplemented on the grounds 
of an expert’s knowledge, or by averaging the output signal linguistic vari- 
able values placed around the empty field. False rules can be recognised as 
discontinuities of the values of the linguistic variable for the output signal. 
For example, if the values 5, L, 5, appear in three successive fields of a 
particular column or a row, it is highly probable that the value L is false 
since technical systems have continuous ranges of operation. 

11.2.1.2. Modification of Wang and Mendel’s method 

The research carried out at the Institute of Automatic Control and Robotics 
of the Warsaw University of Technology (Koscielny et a/., 1999b; Syfert, 2003) 
has shown that Wang and Mendel’s method is highly sensitive to measure- 
ment disturbances. This disadvantage results from the way of reducing the 
set of contradictory rules in the fourth phase of model creation. The algo- 
rithm takes into account only the weight of a rule, not the number of the 
obtained identical rules. The weight of the rule is sensitive to the effect of 
measurement noise. For instance, if we have a deformed output signal which 
has a high degree of membership to particular fuzzy sets, then such a rule 
can eliminate other rules that are inconsistent with this one but nevertheless 
are true ones. The longer the learning period, the higher the probability of 
the existence of such a situation. 

In order to ensure a higher resistance to weak learning data, a modifi- 
cation of the method was introduced (Koscielny et al, 1999b; Syfert, 2003). 
Phases 3 and 4 were modified in comparison with the original method. In 
the modified method instead of choosing a rule which has the highest weight, 
the average value of consequents is calculated for rules which have the 
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Fig. 11.3. Algorithm of residual calculation with the use of the fuzzy model 



same predecessors, according to the following formula: 

If K > 0) then (Sm+i = ^ 

where m denotes the number of rules having identical predecessors, which 
were previously found Bm is the average value of the consequent for these 
m rules, Bi^rn+i denotes the value of the consequent for of the currently 
added rule. 

The presented algorithm allows constructing such types of models that 
have Multi Inputs and a Single Output (MISO). It is easy to proceed to the 
types of models which have Multi Inputs and Multi Outputs (MIMO) by the 
AND-type assemblage of several MIS 0-type models. 

The modified version ensures a higher resistance to low quality learning 
data. It is more suitable to use it for the creation of models which apply data 
measured in a real process. 

11.2.1.3. Calculation of a residual on the basis of the fuzzy model 

An algorithm for the calculation of a residual on the basis of the fuzzy model 
is presented in Fig. 11.3. It is a diagram of inference applied to Mamdani’s 
models. The values of all coefficients of membership to all fuzzy sets 

Akj of the linguistic variable Xk are calculated in the fuzzyfication block for 
each one of the variables. The rule activation level is defined on the grounds 
of the obtained (xkik ip^k) values. It is equal to the product of the coefficients 
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of the membership of the obtained values of particular variables to fuzzy sets 
which are premises of that rule: 



n 

Wi = YlfJ-kikiXk)- 
k=l 



( 11 . 6 ) 



The value of the model’s output is calculated in the defuzzyfication block. 
The fuzzy output set is defined by rules whose activation level is higher than 
zero. Fuzzy sets of the output signal are cut at the value which is equal to the 
activation level of a particular rule. The assemblage of fuzzy sets that were 
cut in this way creates the fuzzy output set. It is shown in the defuzzyfication 
block in Fig. 11.3. One of known defuzzyfication methods, the Centre of Area 
(COA) method (Drinkov et al, 1996; Rutkowska, 2002; Yager and Filev, 
1994), is applied to the calculation of the centred value of the output. The 
output of the fuzzy model is calculated in this case according to the following 
formula: 

/ y Ky) dy 



y = 






(11.7) 



where fi{y) denotes the membership function of the fuzzy set of the output 
variable, calculated from the assembly of the cut membership function of 
fuzzy sets which correspond to conclusions of active rules (having activation 
level values higher than zero). 

The value of the output variable which was calculated on the grounds of 
the fuzzy model is compared with the real value, and normalised according 
to the formula 



r = 



jy-y) 

(yMAx - yMix) 



100 %, 



( 11 . 8 ) 



where r denotes the value of the residual, y is the measured value, y denotes 
the model output, and {yMAX — Vmin) is the nominal range of the variable. 



11.2.2. Fuzzy neural networks 

Fuzzy neural networks, which are a combination of fuzzy modelling tech- 
niques and methods of neural network training, make a convenient tool for the 
generation of residuals. Such networks were suggested by Horikawa and his 
co-workers (1991), and were also examined by Bossley (1997), Fuller (1995), 
Jang (1995), Yager and Filev (1994), and Zhang et al (1996). 

During the construction of a fuzzy neural network, an expert’s knowl- 
edge can be applied to define the number of if-then rules and to perform 
an initial distribution of the membership functions of particular inputs, and 
measurement data can be used for network training (network weights tun- 
ing) . The model obtained with the use of fuzzy neural networks is not a black 
box. It may easily be written in the form of if-then rules and interpreted as 
a fuzzy model. 




418 



J.M. Koscielny and M. Syfert 



Fuzzy Neural Networks (FNN) have a structure which represents the 
fuzzy inference process. Two parts can be singled out in such networks. The 
first part corresponds to rule premises, i.e., to the part contained between 
the words If . . . then. It realises the part of the inference mechanism that is 
responsible for the calculation of the rule activation level. The second part, 
which is contained after the word then . . . , corresponds to fuzzy rule conclu- 
sions. It realises the calculation of the network output using the elaborated 
premise values. 

Three basic kinds of fuzzy neural networks may be discerned. The 
premise part is identical for all three kinds, and differences appear in the 
conclusion part. Three forms of the representation of the output are applied: 
constant (singleton), liner combination of inputs, and fuzzy set. 

11.2.2.1. Fuzzy neural networks with outputs in the form of singletons 

Fuzzy neural networks with outputs in the form of singletons are defined by 
a set of rules having the following form: 

Ri : If xi is Aii^ and X 2 is A 2 i 2 Xn is Ani^ 

then y is (11.9) 

where Xk{k = l,2...,n) denotes the k-th input of the model (n is the 
number of inputs), Aki,^ denotes the fuzzy set of the predecessor that has 
the membership function y is the model output, and yi denotes 

the value of the output (the singleton, i.e., constant). 

An example of a fuzzy neural network which has two inputs, nine rules, 
and uses the bell-shaped Gaussian function for the description of the mem- 
bership function shape is presented in Fig. 11.4. Five layers can be singled 
out in the network structure. Layers (A) to (C) correspond to the predecessor 
part of the rules, and realise the calculations of the rule activation level. The 
network output is calculated in Layers (D) and (E), which correspond to the 
rule conclusion. Layers (A), (B) and (C) of the fuzzy neural network structure 
are responsible for the elaboration of the predecessor membership function 
values. Layer (A) of the network has a symbolic form only and is used for 
delivering the network inputs, as well as the signal equal to 1, to particular 
units of Layer (B). Layer (B) reflects the inputs signal fuzzyflcation opera- 
tions. The number of adding nodes for particular inputs equals the number of 
fuzzy sets, into which the input changing range is divided. Layer (C) output 
is the value of the membership function of the set Akik for a particular input 
Xk . The coefficients of the input membership to particular fuzzy sets that are 
described by the Gaussian function 

Ja;,) = (11.10) 

are calculated in this layer for each one of the input. 
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Fig. 11.4. Fuzzy neural network with two inputs and nine rules 

Analysing the above formula, it is possible to observe that the weights 
Wc and Wg are parameters that determine the membership function posi- 
tion in the particular input space as well as its shape or, more precisely, its 
inclination. 

The number of rules in the network is defined by 

n 

( 11 - 11 ) 

k=l 

where jk denotes the number of fuzzy sets for the A:-th input. 

The inference process is carried out in Layers (D) and (E). The activation 
level of particular rules is calculated in Layer (D) according to the following 
formula: 

n 

Ti = Y[fJ-Aki^{Xk)- ( 11 - 12 ) 

k=l 

The network output is calculated in Layer (E) by the expression 

m 

y* = — - (11-13) 

i=l 

The weights of the network connections existing between Layers (D) 
and (E) represent the singleton values yi in the rules (11.9), =yi. 
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The above network structure is the (1-st type) modification of the net- 
work suggested initially by Horikawa et al (1991). As the membership func- 
tion they used a function consisting of one or two sigmoid functions described 
as follows: 






1 



(11.14) 



If the weights are adequately chosen, a combination of two sigmoid functions 
results in a pseudo-trapezoid function. 



11.2.2.2. TSK-type fuzzy neural networks 

Fuzzy models that realise the Takagi-Sugeno-Kang (TSK) mechanism of in- 
ference (Kang and Sugeno, 1987; Takagi and Sugeno, 1985) are defined by 
the following rules: 



Ri : If xi is Aii^ and X 2 is ^ 2*2 and . . . and Xn is 

then y = fix) = ttio + anxi -\ h (11.15) 

A fundamental difference between the above model and the one expressed 
by (11.9) lies in the consequent of the form of a linear equation of input 
variables x. The TSK method unites therefore fuzzy and analytical modelling. 

The process of output educing on the basis of the rules (11.15) for the 
TSK model is given by 



m 

y* = '^Ti{aio + anxi hOi„a:„), (11.16) 

i=l 

where is an z-th rule activation level. 

A TSK-type fuzzy neural network that has two inputs and nine rules with 
Gaussian membership functions is presented in Fig. 11.5. Layers (A), (B), (C) 
and (D) are identical to those of the network shown in Fig. 11.4 since the way 
of rule activation level calculation is the same as in that network. Layer (E), 
similarly as Layer (A), only transmits appropriate inputs of the network 
to subsequent layers. The linear equation of consequents is elaborated in 
Layers (E), (F) and (G), where the weights denote appropriate coefficients 
of this equation. The network output is calculated in Layer (H) according 
to (11.16) using the firing levels obtained in Layer (D), as well as the 
consequents prepared in Layer (G). 

In the case of the parametric identification of TSK-type models, not only 
the parameters such as Wc and Wg are estimated but also the parameters 
aij (j = 0,1,..., n) of the function fi{x), which are represented as the 
weights in the network. 
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(E) (F) (G) (H) (I) 

^ ) 

Conclusion 



Fig. 11.5. TSK-type fuzzy neural network 
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The method of the highest decrease is usually applied to the identifica- 
tion of such parameters of fuzzy neural networks that have outputs in the 
form of singletons as well as TSK-type networks. This method is called the 
error backward propagation method and is used for neural network training. 

The number of rules of fuzzy models grows rapidly with an increase in 
the number of inputs and the number of fuzzy sets for particular inputs. 
This fact limits the application of fuzzy models to relatively simple systems 
such as parts of installations. In order to lower the number of input signals, 
their aggregation may be carried out. It consists in replacing a subset of 
input signals by a signal that is their appropriately chosen function. The 
aggregation can be carried out simultaneously for several subsets of input 
signals. It can be applied not only to fuzzy models but also to neural ones. 



11.2.2.3. Fuzzy neural networks with outputs in the form of fuzzy sets 

Fuzzy neural networks with outputs in the form of fuzzy sets are defined by 
the following set of rules: 



Ri : (if x\ is Aii^ and X2 is A2i^ and . . . and Xn is Ani^ 

then y is with LTVr., (H-17) 

where Xk {k = l,2,...,n) denotes the A:-th input of the model (n is the 
number of inputs), Aki,^ is the fuzzy set of the predecessor that has the 
membership function V is the output of the model, denotes 

the fuzzy set of the consequent, and LTVr^ is the certainty degree of the 
i-th rule. 

An example of a fuzzy neural network which has two inputs, nine rules, 
two membership function defined for the consequent, and uses the bell-shaped 
Gaussian function for the description of the shapes of membership functions 
for the predecessors, as well as the triangle-shaped membership functions for 
the output, is presented in Fig. 11.6. The main difference between this model 
and the one described by the equation (11.9) is the consequent, of the form 
of a fuzzy set. 

The structure of the above network is the (3rd-type) modification of the 
network suggested initially by Horikawa et al (1991). Layers (A), (B), (C) 
and (D) are identical to those of the network shown in Fig. 11.4 since the 
way of calculating the rule firing level is the same as in that network. 

The inference process is carried out in Layers (E) to (I). The weights 
of the network represent the certainty degrees of the rules, and the weights 
Wg and w'^ are responsible for output partition parameters. 
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Fig. 11.6. Fuzzy neural network with two inputs and nine rules 



Finally, the output of the network is described by the formula 



y = 



Jn + l 

E (m') 

J = 1 

in + 1 ' 

E 

j=i 



(11.18) 



where fi'j denotes the degree of the premises fulfilment, i.e., the output of 
Layer (E), is the reciprocal of the membership function of the j-th 

partition of the output, jn+i is the number of output partitions. Moreover, 



u'. ^ 

y'j m 

E Mi 

i=l 



(11.19) 



B- 1 (fx'j) = W'gfl'j + W' 



( 11 . 20 ) 



where wl correspond to the certainty degree of the z-th rule LTVr. 



11.2.3. Example of fault detection 

Chapter 22 presents applications of diagnostic systems to a steam draft and 
an evaporation station. Wang and Mendel’s fuzzy models as well as fuzzy 
neural networks are used for fault detection in these applications. An example 
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of the application of a fuzzy model to control valve modelling in a three-tank 
system is presented below. A similar example for a pneumatic servo-motor 
control valve assembly can be found in (Koscielny and Bartys, 1997). 

Fuzzy models can be applied to the detection of faults of the control 
valve in the three-tank system. The task is realised with the help of a control 
signal U and a medium volumetric flux F. The residuum is obtained as a 
result of comparing the output of the model and the real volumetric flux 
signal: 

r^^F-P. ( 11 - 21 ) 

Therefore, the working characteristic of the actuator should be identifled: 

P = f{U). (11.22) 

In the first phase, the ranges of the change of the input variable U 
and the output variable F should be divided into regions (partitions), and 
membership functions ought to be attributed to each one of them. Triangle- 
shaped membership functions were attributed to the ranges of the variables 
(Fig. 11.7 and 11.8). 

In the second phase, rules are defined on the grounds of experimental 
data - the pairs (u^, fi). According to Fig. 11.7, the signal uq can be treated 
with a degree of 0.2 as 56 and, simultaneously, it can be treated with a degree 
of 0.8 as 55. 




Fig. 11.7. Example of the definition of the linquistic variable U 
(control signal), where 5 denotes “small” and B - “big” 




Fig. 11.8. Example of the definition of the linquistic variable (volumetric flux) 
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The value of the output /o corresponds to the value of uq, which, ac- 
cording to Fig. 11.8, with a degree of 0.35 equals 56 and with a degree of 
0,65 equals 57. Since the coefficient of the membership of uq to the fuzzy 
set 55 is higher than the coefficient of its membership to the fuzzy set 56, 
and the membership of /o to the set 57 is higher than the coefficient of its 
membership to the set 56, then the pair (uo,/o) generates the rule 

If U is 55 then F is 57. (11.23) 

Similarly, it is possible to obtain the following rules for the pairs (ui, /i), 
(u 2 ,/ 2 ), respectively: 



If U is 55 then F is 56, (11.24) 

If U is 54 then F is 56. (11.25) 

Attributing a weight to each one of the rules is realised in the third phase. 
The weight of the i-th rule is calculated as the product of the membership 
coefficients of the predecessors and the consequent of this rule. In our exam- 
ple, the weight of the rule (11.23) equals (0.8 x 0.65) == 0.52, the weight of 
the rule (11.24) equals (0.8 x 0.8) = 0.64, and the weight of the rule (11.25) 
equals (0.6 x 0.55) = 0.33. The rules (11.23) and (11.24) are mutually in- 
consistent. Since the weight of the rule (11.24) is higher than the weight of 
the rule (11.23), the latter is eliminated. All of the obtained rules are writ- 
ten down in the rule base. In our case, the base has the simplest form of a 
one-dimensional table presented in Fig. 11.9. 



u 


57 


56 


55 


54 




Bb 


B7 


F 


57 


57 


56 


56 




B7 


B7 



Fig. 11.9. Form 1 of the rule base for the servo-motor contol valve assembly 

Values (fuzzy sets of outputs) attributed to particular elements of the 
table of rules correspond to rule consequences while table index elements 
correspond to rule premises. In our simple case, however, it is easier to present 
the rule base in the form of the table presented in Fig. 11.10, in which the 
columns correspond to the values of the volumetric flux, and the rows to the 
values of the control signal. This makes the determination of the base easier, 
e.g., allows detecting incorrect rules. 

Figure 11.11 presents a curve of the fuzzy model of the control valve 
assembly. Certainly, the model curve shape corresponds to the rules pre- 
sented in form 2 (Fig. 11.10) and reflects the non-linear characteristics of 
the assembly. 

The model identifled in the above way is applied to fault detection. One 
of the residuals is calculated on the grounds of the output of the model. 




Fig. 11.11. Valve model F curve 



Figure 11.12 shows an example of a time series that reflects the quality of the 
modelling of a signal of the water flow through the control valve. The upper 
lines show the real value of the flow and the one calculated on the grounds 
of the model, while the lower line shows the relative difference between these 
signals, i.e., the residual value. Modelling errors change within the range of 
. 5%. Limitations in obtaining a better modelling quality result not from limits 
related to fuzzy model features but from a hysteresis of about 5% existing 
within the diagnosed system. Due to the high dynamics of the system in 
relation to the sampling time applied, which was equal to one second, the 
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static model represents sufficiently well the character of the operation of the 
system. 




20 



0 L)(u \L|Vw 



-10 r 




2000 2500 3000 3500 4000 4500 5000 

Fig. 11.12. Example of the modelling quality for the model F 



Figure 11.13 presents an example of fault detection with the use of a residual 
based on the presented model. 




Fig. 11.13. Example of fault detection - the residual ri 
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In the case of a lack of faults, the residual value oscillates around zero. 
The dashed lines in the figures denote time intervals during which particular 
faults were simulated. The black rectangles denote such faults to which the 
residual based on the valve model should be sensitive (a description of the 
faults can be found in Chapter 11.3.4). The horizontal dashed lines show 
approximate boundaries within which the residual value can change in the 
case of the lack of faults. As can be seen, in the case of the existence of 
faults which should be detected, the residual value differs distinctly from 
zero. This is the basis on which fault detection is carried out. In the case of 
faults to which the residual is not sensitive, its value differs only slightly from 
zero but remains within the boundaries denoting the residual positive values. 
Such differences are caused by higher modelling errors existing in the range 
of the new point of operation to which the system goes as a result of fault 
introduction. 



11.3. Fault isolation with the use of fuzzy logic 

Fuzzy logic is applied both to diagnosing algorithms based on the method- 
ology of picture recognition, as well as to automatic inference mechanisms. 
Patterns in the fuzzy classification method are represented as fuzzy sets. In 
order to create them, different algorithms may be used. The C-centre algo- 
rithm (Bezdek, 1991) belongs to the group of the most popular ones. A proper 
classification process consists in calculating the degree of the membership of 
the actual picture represented, for instance, by the vector of residuals, to par- 
ticular pattern pictures obtained at the clasterisation stage. Some examples 
of the application of fuzzy classification to fault isolation are presented in 
(Frelicot and Dubuisson, 1993; Peltier and Dubuisson, 1994). 

In the case of automatic inference algorithms, the rule base is created 
often on the grounds of an expert’s knowledge. Such kinds of procedures are 
used, for example, in the methods of fault isolation which apply the binary 
diagnostic matrix (Koscielny, 2001; Maquin and Ragot, 2000; Sqdziak, 2001), 
parity equations (Miguel et al, 1997), banks of observers (Lee and Vagner, 
1997), or SDG graphs (Montmain and Leyval, 1994; Tarifa and Scenna, 1997). 

Fuzzy neural networks are an alternative description of the fuzzy rule 
base. The fuzzy neural network structure (Piegat, 2001; Rutkowska, 2002) 
results directly from the diagram of the fuzzy rule base - consecutive layers 
correspond to fuzzyfication, inference and defuzzyfication. Thus, it is possible 
to apply learning algorithms, which are typical of neural networks, and the 
network itself is not a black box. It is possible to convert the network to a 
set of rules, thanks to which the network has its physical interpretation. The 
concept of fault isolation on the basis of FNN is presented in (Koscielny, 2001; 
Leonhardt and Ayoubi, 1997). Fault isolation with the application of FNN is 
therefore a union of picture recognition methods and automatic inference. 
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Fuzzy neural networks that have a slightly different structure are also 
applied to fault isolation. In their first layers, the fuzzy evaluation of signals 
is carried out, and then the signals are introduced to the inputs of a normal 
neural network. In networks of this kind, a manifest representation of premises 
of diagnostic rules does not exist, and the network has a higher ability of 
generalisation. On the other hand, obtaining a fuzzy model comprehensible 
to humans on the grounds of the analysis of the network weights is difficult or 
outright impossible. Such networks are used in the case of gaining knowledge 
about the diagnostic relation in the process of automatic learning with the 
help of process data obtained in states with particular faults, as well as in 
the state of complete efficiency. Such a solution is applied to hierarchic fuzzy 
neural networks (Mendes et a/., 2001). 

It is possible to find other methods of creating the fuzzy rule base and 
inference on the basis of fuzzy logic in the literature. For example, (Fiissel et 
a/., 1997) created rules on the grounds of a diagnostic tree obtained with the 
use of the Self- Learning Classification Tree (SELECT) method. The inference 
is carried out on the basis of an AND/OR-type cause-result tree (Ulieru, 
1993). 

Fault isolation algorithms which are presented below are the results of 
the works by (Koscielny, 1999; Koscielny et aL, 1999a; Koscielny and Syfert, 
2000a; 2000b; S^dziak, 2001; Syfert, 2003). 



11.3.1. Fuzzy evaluation of residual values 

The threshold evaluation of residual values has serious disadvantages. It is 
difficult to ascertain threshold values both theoretically and experimentally. 
The evaluation of residual values on the grounds of a threshold test may be 
deceptive, and sometimes leads to inference discrepancy or a false diagnosis. 
The application of fuzzy logic allows us to take the uncertainties of diagnostic 
signals into account. 

A linguistic variable that describes test results (diagnostic signal values) 
may be attributed to each one of the residuals rj. The linguistic variable 
space Vj is a set of all linguistics values that are applied to the evaluation 
of this residual. Fuzzy sets spread on the axis of the residual correspond to 
particular values of the linguistic variable. In the simplest case, the set Vj 
contains two results: a positive one and a negative one, i.e., Vj {P,N]. 
Figure 11.14 presents two- value fuzzy residual evaluation. 

Certainly, if necessary, the set of possible values of the linguistic variable 
may be expanded by taking into account the detected symptom’s size and 
sign. In each case, however, the set of linguistic values contains one value 
which corresponds to the positive test result (the residual value is nearly 
equal to zero), and implies a lack of a symptom, as well as other values, 
which denote fault symptoms. Figure 11.15 presents three- value fuzzy residual 
evaluation. 
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Fig. 11.15. Three- value fuzzy residual evaluation 

Diagnostic signals in such an approach are fuzzy variables. A general 
formula that describes a multi- value fuzzy diagnostic signal looks as follows: 

~ • '^ji ^ (11.26) 

where fiji denotes the function of the membership of the j-th residual to 
the fuzzy set vji, and Vj is the set of linguistic values of the j-th diagnostic 
signal. 

For instance, the j-th diagnostic signal is defined in two- value evaluation 
as follows: 



(11.27) 
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where P denotes a fuzzy set containing the values of the residual in the 
fault-free state (a positive test result), and is a fuzzy set containing the 
values of the residual in states with faults (a negative test result). 

In the case of two- value fuzzy residual evaluation, it is advisable that 
the following formula be fulfilled: fijp + fijN = 1. In order to define the fuzzy 
diagnostic signal (11.27), it is then necessary only to know the function of 
membership to the set P. 

Similarly, the diagnostic signal has the following form in three-value 
evaluation: 

^3 = (11.28) 

where N+ denotes a fuzzy set containing the positive values of the residual 
in states with faults, and A^— is a fuzzy set containing the negative values 
of the residual in states with faults. 

The following division of the diagnostic signal value range into partitions may 
be given as an example of multi- value evaluation: 



Vj = {P,NS-,NM-,NB-,NS+,NM+,NB+}, (11.29) 

where NS—, NM-, NB— are fuzzy sets corresponding to negative test 
results having negative values: low, average, and high, respectively, and NS+, 
NM+, NB+ are fuzzy sets corresponding to negative test results having 
positive values: low, average, and high, respectively. The fuzzy diagnostic 
signal value is therefore defined by the coefficients of the membership of the 
calculated residual value to particular fuzzy sets. 

11.3.2. Rules of inference 

Fuzzy diagnostic inference is carried out on the grounds of a rule base defin- 
ing the relationship existing between diagnostic signals and faults or system 
states. Let us assume that only the system states with single faults, as well 
as the state of complete efficiency, will be taken into account. Inference rules 
can be derived from the binary diagnostic matrix or the information system 
(Chapter 2.5), or they may be formulated directly using an expert’s knowl- 
edge. Depending on this fact, they can have different forms. 

The following rules result from the binary diagnostic matrix supple- 
mented with the state of complete efficiency: 

Rq : If {si = P) and . . . and (sj = P) and . . . and {sj = P) 

then the state of complete efficiency zq, (11.30) 

Rk : If {si = N) and . . . and {sj = N) and . . . and {sj = P) 
then the state with the fault fk> (11.31) 

The value of 0 appearing in the matrix corresponds to the positive test result 
P, and the value of 1 - to the negative test result N. 
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Let us notice that the rule base derived form the binary diagnostic ma- 
trix may contain contradictory rules, i.e., rules having the same premises 
but different conclusions. Such rules correspond to faults (states with single 
faults) that are indistinguishable. In the case when the elementary blocks, 
which group indistinguishable faults, are separated, the rules corresponding 
to these blocks will be obtained. Such rules are not contradictory and may 
be written down in one of the following two forms: 

Rm If (si == P) and . . . and {sj = N) and . . . and (sj — N) 

then a fault belonging to the elementary block Em^ (11.32) 

Rm • If (^1 = P) and . . . and {sj — N) and . . . and {sj = N) 

then the state with the fault fk or fm or ... . (11.33) 

The rule for the state of complete efficiency is identical to the rule (11.30). 

Rules that result from the information system have a more general form. 
In the case of a simple Fault Information System (FIS) , the relationship which 
exists between diagnostic signal values and faults may be defined in the form 
of rules corresponding to particular FIS columns: 

Rk : If {si=Vki) and ... and {sj = Vkj) and ... {sj-Vkj) 

then the state with the fault fk- (11.34) 

In the case of the approximate FIS, the rules corresponding to particular 
faults are as follows: 

Rk ’ If (si ^ Vki) and ... and {sj G Vkj) and ... (sj G Vkj) 

then the state with the fault fk- (11.35) 

The rules (11.35) may also be presented in the following form: 



Rk If [(^1 — Va) or ... or (si = i;c)] and . . . and 
[{sj = Vb) or ... or (sj = Vg)] 

then the state with the fault fk- (11.36) 

Fault distinguishability in the approximate FIS is not constant but de- 
pends on the combinations of the obtained diagnostic signal values. Thus, 
the particular rule conclusions are not separate, so the rules themselves can 
be contradictory. 

The inference is carried out on the grounds of the above rules. Let us 
assume that the rule base may contain contradictory rules. Such a base is 
usually incomplete since it does not contain rules for all combinations of 
diagnostic signal values but only for those that correspond to the columns of 
the binary diagnostic matrix, or the information system FIS. 
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Fig. 11.16. Fuzzy fault isolation system 



11.3.3. Fuzzy diagnostic inference 

A typical fuzzy inference system contains three blocks: the fuzzyfication 
block, the inference block, and the defuzzyfication block. The input and out- 
put signals of the system are both sharp ones. Such a situation exists in the 
case of fuzzy modelling and control. In the case of fault isolation, the struc- 
ture of the fuzzy inference system is simpler, and contains only two blocks: 
the fuzzyfication block and the inference block. The continuous values of 
the residuals act as the input signals. They are fuzzyfied in the fuzzyfica- 
tion block, whose output signals are fuzzy diagnostic signals. They act as 
the inputs of the inference block. The states of the system together with the 
certainty factors that are attributed to these states act as diagnostic system 
outputs. 

The fuzzy fault isolation system structure is presented in Fig. 11.16. The 
first stage of fuzzy inference consists in defining the activation level for all 
of the rules. The level can take values belonging to the [0, 1] range. If the 
activation level of a rule is equal to zero, the rule does not take part in the 
further process of inference. 

In the case of inference using the binary diagnostic matrix, the premise 
of each rule has the form of the conjunction of the simple premises (11.30) 
and (11.31). Therefore, let us define the degree of fulfilment of the simple 
premise for the j-th. diagnostic signal in the A:-th rule. It results from com- 
paring a particular fuzzy diagnostic signal with its value contained in the 
analysed rule. The degree is equal to the coefficient of the membership of the 
j-th. diagnostic signal a fuzzy set which has the linguistic value indicated in 
the /c-th fault rule, i.e., in the fault signature: 






liPj for Vkj = P, 
fiNj for Vkj = N. 



(11.37) 



The degree of fulfilment of the simple premise can therefore be interpreted as 
the degree of consistency of the obtained diagnostic signal with its pattern 
value appearing in the rule. 

The degree of fulfilment of the rule premise for the rules (11.30) 
and (11.31), which are conjunctions of simple premises, is defined accord- 
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ing to fuzzy logic rules, which can be presented in a general form as 

fi{zk) = fi{zk,si) (g) ii{zk,S2) (8) • • • g) Ii{zk, Sj), (11.38) 

where g is a general operator of fuzzy conjunction, zq denotes the state 
of complete efficiency, Zk is the state with the fault fk, and J denotes the 
number of diagnostic signals. 

In order to calculate the activation level of the A:-th rule, t-norm opera- 
tors, e.g., PROD or MIN, are applied. Moreover, in order to realise the fuzzy 
product, functions that are not t-norms are used. For example, it is possi- 
ble to mention the so-called averaging operators. The one that realises the 
arithmetic mean value is called MEAN and is one of the most popular ones. 
The activation level of the rules (11.30) and (11.31) is therefore calculated as 
follows: 

• for the PROD operator, it is the product of the coefficients of fulfilment 
of all simple premises that exist in the rule: 

K^k)pROD = IJ-{Zk,Sl)n{Zk,S2) . . . n{zk, Sj)-, (11.39) 

• for the MIN operator, it takes the lowest value of the simple premise 
fulfilment coefficient in the rule: 

fi{zk)MiN = MIN {fj,{zk, Si), fi{zk,S 2 ),. .. ,lJ.{zk,sj)}; (11.40) 

• for the MEAN operator, it is calculated as the arithmetic mean value 
of the simple premises: 

^ KZk,Sl)+fl{Zk,S2) + --- + fJ.{zk,Sj) 

l^\Zk)MEAN — -j • (11-41) 

Let us notice that the PROD and MEAN operators applied to the cal- 
culation of the activation level of a rule use the fulfilment coefficients of all 
simple premises, while the MIN operator uses exclusively the lowest fulfil- 
ment coefficient of one out of a set of simple premises. 

The conclusions of rules have the character of one-element sets (single- 
tons) which have the membership degree equal to one. Due to inference, it 
is possible to define the membership function of the conclusions of partic- 
ular rules for particular input values. By using the MIN operator, one can 
calculate that the value of the membership coefficient of the conclusions of 
a particular rule is equal to the activation level of this rule. Therefore, the 
higher the activation level of a rule, the higher the factor of the certainty 
that the state with the fault indicated in the conclusion appeared. The fault 
certainty factor equals the activation level of the appropriate rule. 

The membership function of rule base conclusion has a discrete character. 
It contains the activation levels of the active rules. They are interpreted as 
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the coefficients of the certainty of the existence of system states defined in the 
active rules. Such a piece of information is a diagnosis generated at the fuzzy 
fault isolation system output. It has the form of a set of pairs: the system 
state - the coefficient of the certainty of its existence: 

DGN = {{zk,lJ.izk)) : iJ-izk) > 0} for k = 0,l,...,K, (11.42) 

where zq denotes the state of complete efficiency, Zk is the state with the 
fault fk, and K is the number of system faults. 

The application of the PROD operator to the calculation of the activa- 
tion level of a rule has a vital advantage. It allows us to state whether or not 
the rule base is complete and consistent. The base of N rules is complete 
and consistent (Piegat, 2001) if the sum of the activation levels of all rules 
for any state of inputs (any diagnostic signal values) is equal to one: 

N 

^M(^n)pi?OD = 1- (11.43) 

n=l 

The number of rules in a complete base, with the binary residual eval- 
uation, equals N = 2'^ . The rule base defined using the binary matrix of 
elementary blocks is not complete it the sense that it does not contain rules 
for all possible combinations of the linguistic values of diagnostic signals. Not 
all such combinations are possible in reality. The rule base is created for the 
state of the complete efficiency of the system, as well as for states with single 
faults. In practice, one can never be sure whether or not the accepted set of 
faults contains all possible faults. Rules not taken into account in the base 
correspond, among other things to system states with multiple faults, states 
with omitted faults, and impossible combinations of diagnostic signal values. 

The base of rules that contains rules of the type (11.32) and (11.33), 
which result from the binary matrix of elementary blocks, is consistent. The 
value of the sum of the activation levels of all rules in the base, calculated 
with the use of the PROD operator, is defined by 

K 

Ms = n{zk,si)n{zk,S2) . . . lJ.{zk,sj). (11.44) 

fc=o 

With a lack of contradictory rules, the value of this index belongs to the 
[0, 1] range: 

/xs < 1, (11.45) 

and denotes the measure of the certainty of the obtained diagnosis. 

The closer to one the value of this sum, the surer the diagnosis. A low 
value of the sum may mean that not all of the faults have been taken into ac- 
count in the base, that multiple faults have appeared, or that false diagnostic 
signal values have appeared due to the existence of high disturbances. 
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The contradictory rules (11.31) appear if they are derived on the basis 
of the binary diagnostic matrix without separating elementary blocks. They 
correspond to indistinguishable faults. In such cases, the value of the index 
/is does not exceed the number H of indistinguishable faults indicated in 
the diagnosis: 

/is < H. (11.46) 

In the case when diagnostic signal values are completely consistent with those 
given in the rules, the value of the index /is equals the quantity of indistin- 
guishable faults. 

In order to calculate the activation level of a rule, the following formula 
may also be applied: 



/ V i^{Zk)pROD 

l^\Zk)PROD/T. — 



gj) /i(^fc, ^2) ■ • . 

K 

Y, l^{Zk,Si) fi{Zk,S2) . . . l^{Zk,Sj) 
k=0 



(11.47) 



Let us mark this operator with the symbol PROD/T,. It norms the calculated 
values of the activation levels of rules in such a way that their sum is equal 
to one. However, this causes a loss of information about the completeness 
and inconsistency of the rules. Because of that, this operator may be used 
together with the index /is. 

In the case of a rule base derived from the information system, the form 
of the rules (11.35) and (11.36) is more complex. In general, a rule is the con- 
junction of complex premises that correspond to particular diagnostic signals 
but the logic sum of simple premises corresponds to each one of the diag- 
nostic signals. The number of the simple premises for each diagnostic signal 
is equal to the number of diagnostic signal values that can be taken in a 
particular state. The conclusion of each rule corresponds to one state of the 
system. The rule conclusions are not separate, so the rules may be contra- 
dictory, i.e., they may have the same premises (diagnostic signal values) but 
may indicate different system states, either unconditionally or conditionally 
undistinguishable. 

It is therefore necessary to define a method for calculating the degree 
of the fulfilment of the complex premise for the j-th diagnostic signal in the 
A;-th rule. It results from comparing a particular fuzzy diagnostic signal with 
its values contained in the rule considered: 



n{zk,Sj)Fis € Vjk) = fj'ivji) ® n{vj2) ® ® (11.48) 

where 0 denotes a general operator of the fuzzy alternative, L is the number 
of diagnostic signal sj values in the rule that defines the state Zk\ L — |T4j|, 
and Vkj is the subset of the pattern values of the signal sj for the state Zk . 

Various operators of fuzzy alternative can be applied. Those most often 
used are 5-norm operators. If the MAX operator is applied, then the degree 
of the fulfilment of the alternative premise for the j-th. diagnostic signal is 
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equal to the maximum value of the degree of the membership of the j-th 
diagnostic signal to fuzzy sets that have linguistic values indicated in the 
A;-th rule: 



l^{Zk,Sj)FIS = (11.49) 

Therefore, the degree of the fulfilment of a premise for the j-th diagnostic 
signal is interpreted as the degree of the consistency of the obtained diagnostic 
signal with its pattern value appearing in the rule. 

The rule activation level may be calculated with the use of one of the 
following operators: PROD, MIN, MEAN, or PROD/T,, according to the 
formulae (11.39), (11.40), (11.41), and (11,47), respectively. Moreover, the 
index (11.44) is calculated for the PROD operator. The diagnosis calcu- 
lated according to the formula (11.42) indicates system states for which the 
rule activation level is higher than zero. 

11.3.4. Example of fault isolation 

In order to illustrate fuzzy diagnostic inference, a case of fault isolation in a 
three-tank system (Fig. 1.3) will be considered. The list of all possible faults 
in this system is given in Table 11.1. Let us assume that fault detection is 
realised with the use of five residuals generated on the grounds of the physical 
equations presented in Table 11.2. 



Table 11.1. Set of faults for a three-tank system 



fk 


Fault description 


h 


fault of the flow sensor F 


h 


fault of the level sensor Li 


h 


fault of the level sensor L2 


h 


fault of the level sensor L3 


h 


fault of the control path U 


h 


fault of the control-valve 


h 


fault of the pump 


h 


lack of medium 


h 


partial clogging of the channel between the tanks Zi and Z2 


fio 


partial clogging of the channel between the tanks Z2 and Z3 


fii 


partial clogging of the outlet 


fl2 


leakage from the tank Zi 


/l 3 


leakage from the tank Z2 


/l 4 


leakage from the tank Z3 
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Table 11.2. Diagnostic tests 



Diagnostic 

signal 


Detection algorithm 


Decision 

algorithm 


5l 


n=F-F = F- f(U) 


\n\<Ki 


S2 


T2 = F- ai 2 Si 2 y/ 2 g{Li - L2) ~ 


\t2\ < K2 


S3 


rs = o^l2Sl2^/2g{Ll — L2) — OL2zS2z^2g{L2 — L3) — ^ 2 -^^ 


IrsI < K3 


S4 


= Q!23 5'23\/2^(I/2 — Ls) ~ asSsy/^gLs — ^3-^ 


|r4| < K4 


S5 


n = F - asS 3 y/ 2 gL 3 - Ai^ - A2^ - 

at at dt 


|rs| < Ks. 



S/F 


/i 


/2 


h 


/4 


h 


/e 


/r 


fs 


h 


/lO 


fii 


/12 


/l3 


/l4 


Si 


1 








1 


1 


1 


1 














S2 


1 


1 


1 












1 






1 






S3 




1 


1 


1 










1 


1 






1 




S4 






1 


1 












1 


1 






1 


S 5 


1 


1 


1 


1 














1 


1 


1 


1 



Fig. 11.17. Binary diagnostic matrix 



The rule base created using the binary diagnostic matrix (Fig. 11.17) for this 



system 


is i 


is follows: 


















Rq: 


If 


{si = 


P) and 


(s2 = P) and 


(S 3 = 


--P) 


and 


(S 4 • 


= P) 


and (s5 : 


= p) 






then 


state zq 


(fault free) 
















i?i: 


If 


{si = 


N) and 


(s2 = N) and 


(S 3 ^ 


= P) 


and 


(S 4 


= P) 


and (s5 


= N) 






then 


state zi 


(fault /i) 
















R2: 


If 


{si = 


P) and 


(s2 = N) and 


(«3 = 


= N) 


and 


(S 4 


= C) 


and (s5 


= N) 






then 


state Z2 


(fault /2) 
















R3: 


If 


(si = 


P) and 


(s2 == N) and 


(«3 = 


= N) 


and 


(S 4 


= N) 


and (55 


= N) 






then 


state zz 


(fault /s) 
















R4: 


If 


(51 - 


P) and 


(s2 = P) and 


(«3 = 


--N) 


and 


(S 4 


= N) 


and (s5 


= N) 






then 


state Z4 


(fault /4) 
















Rb' 


If 


(^1 = 


N) and 


(s2 = P) and 


(S 3 = 


= P) 


and 


(S 4 


= P) 


and (s5 


= P) 






then 


state Z5 


(fault /s) 
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Rq: If (5i = N) and (s2 = P) and (ss P) and (54 = P) and (55 = P) 
then state zq (fault /e) 

If (51 = N) and (52 = P) and (53 = P) and (54 = P) and (55 == P) 
then state Z7 (fault /r) 

Rg: If (51 — N) and (s2 = P) and (53 = P) and (54 = P) and (55 = P) 
then state zg (fault fg) 

Rq: If (51 = P) and (52 = N) and (53 = N) and (54 = P) and (55 = P) 
then state zq (fault fg) 

Rio- If (^1 = P) and (52 = P) and (53 == N) and (54 = N) and (55 = P) 
then state 2^10 (fault /lo) 

Rii: If (si = P) and (52 = P) and (53 = P) and (54 = N) and (55 = N) 
then state zn (fault fu) 

Pi2- If {si = P) and (s2 = N) and (53 == P) and (S4 = P) and (55 = N) 
then state Z12 (fault /12) 

Pi3- If (^1 — P) and (^2 = P) and (53 = N) and (54 = P) and (55 = N) 
then state zig (fault /13) 

Ri4' If (^1 = P) and (s2 == P) and {sg = P) and (54 — N) and (55 — N) 
then state 2:14 (fault /14) 

The rules i?5, i?6, ^7, R3 as well as Ru and i?i4 are contradictory since 
the faults {/s, /e, /?, /s} and {fn,fu} are indistinguishable. 

The rules of inference with the use of different operators for calculating 
the rule activation levels are given below: 

(a) Let us assume that the following symptoms appeared: 

51 = {{P,1),{N,0)}, S 4 = {{P, 0.3), (TV, 0.7)}, 

5 2 = {(P,0.9),(iV,0.1)}, S 5 = {(P,l),(iV,0)}. 

5 3 = {(P, 0 ),(iV,l)}, 

Table 11.3 presents the results of calculations of the firing level of all 
rules with the use of the PROD, PROD/T,, MIN and MEAN operators. 
The diagnosis for particular operators looks as follows: 

DON prod = {(2:10,0.63), (2:9,0.03)}, 

DGNprod/e = { {^ 10 , 0.955), (.^9, 0.045)}, 

DGN min = {(^ 10 , 0.7), { 2 : 9 , 0 . 1 )}, 

DGN mean = {(2:10,0.92), (2:4,0.72), (2:9,0.68), ( 213 , 0 . 64 ), 

(zo, 0.64), ( 03 , 0.56),...}. 
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Table 11.3. Rule activation coefficients, the case (a) 



F 


Zo 


2:1 


2:2 


2:3 


2:4 


2:5 


2:6 


Z7 


2:8 


29 


2:10 


2:11 


2:12 


2:13 


2:14 


PROD 


0 


0 


0 


0 


0 


0 


0 


0 


0 


0.03 


0.63 


0 


0 


0 


0 


PROD / i : 


0 


0 


0 


0 


0 


0 


0 


0 


0 


0.045 


0.955 


0 


0 


0 


0 


MIN 


0 


0 


0 


0 


0 


0 


0 


0 


0 


0.1 


o.r 


0 


0 


0 


0 


MEAN 


0.64 


0.08 


0.48 


0.56 


0.72 


0.44 


0.44 


0.44 


0.44 


0.68 


0.92 


0.52 


0.28 


0.64 


0.52 



(b) Let us assume that the following symptoms appeared: 

51 = {(P,l),(iV,0)}, 54 = {(P,0),(7V,1)}, 

52 = {{P,1),(N,0)}, S5 = { (P, 0) , (iV, 1) } . 

53 = {(P,l),(iV,0)}, 

Table 11.4 presents the results of calculations of the activation level of 
all rules with the use of all four operators. 



Table 11.4. Rule activation coefficients, the case (b) 



F 


Zo 


2:1 


Z2 


2:3 


2:4 


2:5 


2:6 


Z7 


2:8 


2:9 


2:10 


2:11 


Z 12 


2:13 


2:14 


PROD 


0 


0 


0 


0 


0 


0 


0 


0 


0 


0 


0 


1 


0 


0 


1 


PROD/'S 


0 


0 


0 


0 


0 


0 


0 


0 


0 


0 


0 


0.5 


0 


0 


0.5 


MIN 


0 


0 


0 


0 


0 


0 


0 


0 


0 


0 


0 


1 


0 


0 


1 


MEAN 


0.6 


0.04 


0.4 


0.6 


0.8 


0.4 


0.4 


0.4 


0.4 


0.2 


0.4 


1 


0.6 


0.6 


1 



The diagnosis for particular operators is as follows: 



DGN PROD — 1)5 (^14, 1)}, 

DGN prod/t, — 0-5), ( 2 : 14 , 0 . 5 )}, 

DGN min — {(^11, 1 ), (^ 14 , 1 )}, 

DGN mean = {(^11, 1), (2:14, 1), (2:4, 0.8), . . . }. 

Let us analyse the obtained results. The signals S 2 and S 4 in the case 
(a) are uncertain (they have a fuzzy character). Diagnoses obtained with the 
use of the PROD, PROD/T, and MIN operators are similar one to another. 
They indicate the state with the fault /lo, for which the certainty coefficient 
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is highest, or the state with the fault /e, which has a significantly lower value 
of the certainty coefficient. The PROD operator gives higher activation 
levels than the PROD operator due to the norming of the sum of the acti- 
vation levels of all rules to the value of one. The diagnosis elaborated on the 
grounds of the MEAN operator gives the values of the certainty coefficient 
different than zero for all of the rules. 

Case (b) shows that in the case of obtaining diagnostic signals having 
sharp values consistent with the rule, the diagnosis generated with the use of 
the PROD, PROD I Y and MIN operators is completely consistent with the 
diagnosis elaborated on the grounds of the classical logic, and indicates states 
with the indistinguishable faults /n and /14. However, due to normalisation, 
for the PROD /Y operator both indicated faults have certainty factors equal 
to 0.5. 

Let us notice that the value of the sum /xe (11.44) may be interpreted as 
the index of the certainty of the diagnosis for the PROD operator. It equals, 
respectively, 

for the case (a) ~ — 0.66, 

for the case (b) - /xs = 2. 

The faults indicated in the case (a) are distinguishable, therefore the value 
of the index is lower than one since the symptoms are uncertain and the rule 
base is incomplete. In the case (b), the value of the index is equal to the 
number of indistinguishable states indicated in the diagnosis. 

The inference was carried out in the analysed example using rules of the 
type (11.30) and (11.31). Elementary blocks that contain indistinguishable 
faults were not separated. Some examples of diagnosing for the three-tank 
system using (11.32), (11.33), (11.35), or (11.36) are given in the monography 
(Koscielny, 2001). Contradictory rules do not appear in the case of elementary 
block separation, and the value of the index /xe cannot be higher than one. 



11.3.5. Uncertainty of the diagnostic signals-faults relation 

In the previous deliberations, the uncertainty of the definition of the rela- 
tion existing between diagnostic signals and faults has not been taken into 
account. It was assumed to be absolutely convincing that if the fault fk 
had appeared, then the diagnostic signal Sj would have taken one of the 
defined values. In practice, one can expect situations when such a conviction 
is unjustified. 

In order to take uncertainty into account, a fuzzy diagnostic relation 
was introduced in (Koscielny, 2001; S^dziak, 2001). The Fuzzy Fault Isolation 
System (FFIS) developed by S^dziak (2001) is a generalisation of the fuzzy 
diagnostic relation and the information system. In this case, uncertainty is 
understood as the conviction that if the fault fk appears, then the diagnostic 
signal Sj will take one of the values belonging to the set Vkj . 
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In order to take into account the fuzzy character of the relations which 
exists between diagnostic signals and faults, a new structure called the FFIS 
should be created on the grounds of the FIS as follows: 

FFIS = {F,S,Vs,^.FRfs) = {FIS^FRfs): (H.50) 

where F, S, Vs, ^ are described as for the FIS according to the equa- 
tions (2.79) to (2.84) and FRfs is fuzzy diagnostic relation defined for the 
FIS. 

The relation FRfs niay be defined as 

FRfs = fiF{fk,Sj)) : {fk^sj) e F x S, 

fiF{fk,Sj):FxS ^[0,1]], (11.51) 

where jiF{fk,Sj) denotes the coefficient of the membership of values that be- 
long to the set Vkj to the pair {fk,sj); it may be presented as the conviction 
that if the fault fk appears, then the result of the diagnostic signal Sj will 
take one of the values belonging to the set Vkj . 

Let us define a method of calculating the activation levels for complex 
premises. The formula which is the complex premise CP{fk,sj) takes the 
form 

CP{fk,Sj) — {sj eVjk) A [FRFsUk.Sj) — l). (11.52) 

The membership degree of the complex premise (which concerns one diag- 
nostic signal) can be presented in the general form by the following formula: 

f^{fk,Sj)FFis = Ksj G Vjk) ^ H>R[fk,Sj), (11.53) 

where 0 denotes the fuzzy conjunction operator. The following equations 
are fulfilled for system states with single faults k = 1, . . . ,F: fijii^k^sj) = 
fJ'RifkiSj) and IJ^{Zk,Sj)FFIS = /^{fk,Sj)FFIS 

The PROD operator as the fuzzy conjunction operator will be applied 
to the calculation of the activation levels of the rule premise. If the additional 
MAX operator (11.49) is applied to the calculation of the activation level of 
the alternative premise fi{sj G Vjk), it is possible to obtain the formula for 
the membership degree of the complex premise: 

IJ-ifk,Sj)FFIS = MAX ■ ■ .,n{VjL)}fiRifk,Sj). (11.54) 

The formula (11.54) is therefore a generalisation of the equation (11.48). 



11.4. Fault isolation with the use of the fuzzy neural network 

Fuzzy neural networks may be applied to the implementation of fuzzy residual 
evaluation, as well as to fault isolation. The features of such networks are 
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Fig. 11 . 18 . Diagnostics with the use of the fuzzy 
neural network for fault isolation 



inherited from fuzzy systems and neural networks, which allows them to be 
applied to diagnostic systems at the stage of fault isolation. Contrary to 
artificial neural networks, the fuzzy neural network is not a black box. What 
is very important is that an expert’s knowledge about the diagnostic signal 
values-faults relation may be directly written in the FNN as appropriate 
weights of the network. At the same time, it is also possible to tune the 
weights automatically with the use of training algorithms developed for neural 
networks (Koscielny and Syfert, 2000b). 

A general conception of diagnosing with the use of fuzzy neural networks 
for residual evaluation and fault isolation is presented in Fig. 11.18. In this 
solution, the fuzzy neural network is applied to the fuzzy evaluation of resid- 
uals and the results of simple diagnostic tests, and to fault isolation. The 
diagnosis indicates states with a single fault, the state of complete efficiency, 
and an unknown state, together with the certainty factor calculated for all of 
the states. Analytical, neural, fuzzy and fuzzy neural models as well as simple 
heuristic relationships may be applied to the calculation of the residuals. 

The structure of the fuzzy neural network that implements fuzzy resid- 
ual evaluation and fault isolation is presented in Fig. 11.19. It is a direct 
adaptation of the fuzzy neural network having the Ttype structure with out- 
puts in the form of singletons suggested by Horikawa and his co-workers 
(1991). The network has six layers. Layers (A), (B) and (C) are responsible 
for elaborating the values of the membership functions of predecessors, i.e., 
fuzzy residual evaluation. An appropriate choice of the number of neurons 
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Fig. 11.19. Basic structure of the fuzzy neural 
network for residual evaluation and fault isolation 



in Layers (B) and (C), as well as of the function realised in the neurons of 
Layer (C), allows us to implement multi- value residual evaluation. The form 
of membership to particular residual values may be shaped independently for 
each function. 

Fault isolation is implemented in Layers (D), (E) and (F). These lay- 
ers correspond to Layers (D) and (E) of the network which is presented in 
Chapter 11.2. Certainty factors of particular faults are calculated in these 
layers. The minimum number of the outputs of the network corresponds to 
the number K of the analysed faults plus the coefficient of the state of com- 
plete efficiency. Depending on network configuration, two additional outputs 
may exist for the coefficient of an unknown state of the system fi us , and for 
the diagnosis certainty index //s- 



11.4.1. Realisation of fuzzy residual evaluation by 
the fuzzy neural network 

Individual fuzzy two-, three-, or multi- value evaluation may be applied to 
every residual or continuous diagnostic signal. Particular fuzzy sets may be 
described by membership functions having any shapes. In the application of 
fuzzy neural networks, continuous functions are usually used due to reasons 
related to calculations; the one most often applied is the Gaussian function. 
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The application of continuous functions is advisable in the case of tuning 
network weights on the basis of available training data. In the case of fault 
isolation, the collection of adequate sets of learning data is often very difficult 
or outright impossible. It is assumed that the network weights are defined by 
an expert on the grounds of his/her knowledge about residual evaluation as 
well as the diagnostic signals-faults relation. In such cases the application of 
trapezoid membership functions is most natural. 

Fuzzy residual evaluation is implemented in the first three layers of the 
network. Layer (A) has a symbolic form only and is used for transmitting 
the signals of residuals to the network, as well as a signal equal to one to 
particular units of Layer (B). Layer (B) sums a residual with the weight Wc 
multiplied by a signal having the value of one. The weight defines the position 
of the centre of the function of membership to the n-th fuzzy set of the j-th 
residual. The weights Wgi and Wg 2 are responsible for the width and the 
inclination of the trapezoid arms. The meaning of the weights is presented in 
Fig. 11.20. 




Fig. 11.20. Parameters of the trapezoid membership function 

The value of the function of membership to particular fuzzy sets, 
i.e., the output of Layer (C), is calculated according to the following formula: 

'•«(’■') = (“• 

Every operator of Layer (C) corresponds to one of the linguistic values vjnj 
of a particular residual. The number of neurons in Layers (B) and (C) is 
equal to the sum of diagnostic signal values (fuzzy sets) which describe all of 
the residuals. 

11.4.2. Fault isolation in the fuzzy neural network 

Fault isolation is implemented in Layer (D) to (F) of the network. It is carried 
out on the basis presented in Chapter 11.3. The structure of this part of the 
network is defined by an expert on the basis of the knowledge of the diagnostic 
signals-faults relation written, for example, in the fault information system. 
The network construction is based on rules defined on the grounds of the FIS. 
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The rules of inference in their general form have a conjunction-alternative 
character: 



Rm ■■ If 



( (Sl = Va) or 



or (si = Vc ) ) and . . . and 



{{sj = Vb) or ... or (sj=t;e))] 
then [the state with the fault fk or . . . ovfm] • (11.56) 



Each one of such rules may be replaced by a subset of rules that has the 
structure of the conjunction of simple premises: 

Rm : If [(si = Va) and ... i (sj = Vc)] 

then [the state with the fault /j^] . (11.57) 



These rules unite the combinations of diagnostic signal values with sys- 
tem states. It is assumed that contradictory rules may exist - in the case of 
the existence of indistinguishable faults. The premises of particular simple 
rules (11.75) are represented by the neurons of Layer (D). To each one of the 
inputs of the H operators in Layer (D), one output fijn^ of Layer (C) for 
every residual is introduced. The ]J operator realises the product of the de- 
grees of fulfilment of simple premises. The output of Layer (D) is calculated 
by 

j 

— PJ Mjnj (^j)- (11.58) 

i=l 

The neurons in Layer (D) can represent all possible rules that have the 
form (11.57), or only those that appear in the diagnostic matrix. Due to 
the high number of possible rules, the latter solution is usually applied. The 
neurons in Layer (D) represent particular rule premises. The maximum num- 
ber of neurons in Layer (D) is equal to 

K J 

^ ~ TT (11.59) 

k=0j=l 

where njk is the number of possible values of the j-th diagnostic signal which 
are attributed to the A:-th fault in the FIS, njk = \Vkj\> 

The following operation is implemented in Layer (E) : 

N 

Tk = (11.60) 

n=0 

The outputs of all elements of Layer (D) are introduced to the input of 
each element of Layer (E) . The weight Wf is attributed to each one of the k- 
th connection. It equals one if the combination of simple premises represented 
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in the neuron of Layer (D) is an element of the rule that defines the A;-th 
fault. Otherwise, the weight equals zero: 



w 



km 

/ - 



0 if 3j Vmj ^ Vkj 5 

1 if Vj VfYij G Vkj ? 



(11.61) 



where Vmj denotes j-th. diagnostic signal value defined in simple premise of 
m-th neuron of Layer (D). 

The number of neurons in Layer (E) is equal to the number of faults 
plus two additional neurons that represent the state of complete efficiency, 
{K + 1), as well as an unknown state of the system, {K + 2). The activation 
level of particular rules, which is interpreted as the a particular state certainty 
factor, is calculated in this layer with the use of the PROD operator. The 
element that represents an unknown state of the system is usually applied 
only when all possible simple premises are represented in Layer (D). In this 
case, the outputs of all neurons of Level (D) which represent the signatures 
not belonging to the FIS are connected to this particular neuron. 

Layers (E’) and (F) have an auxiliary character. They are applied to net- 
work output normalisation (in the case when the diagnosis should be calcu- 
lated similarly as with the PROD/T, operator). If the fuzzy inference diagram 
with the PROD operator is applied. Layers (E) and (F) are not necessary. 

Finally, it is possible to obtain at the network outputs 

E tkwY' 

M.. = ^ • ( 11 . 62 ) 

E 

n=0 

When indicating indistinguishable faults, the coefficient value will be evenly 
divided between these faults, like for PROD/T, operator. For example, for 
the diagnosis that indicates the indistinguishable faults /n and /14, both of 
them will be indicated with the same coefficients, equal to 0.5. 

Layer (D’) has an auxiliary character. The index of the certainty of the 
diagnosis is calculated in this layer. If the network structure in which all fault 
signatures are represented is used, then a low value of the index //s testifies 
directly to an incorrect choice of the coefficients of fuzzy residual evaluation. 
In other cases, this coefficient may be applied only to an indirect calculation 
of the coefficient of an unknown state of the system (/ic/5 = 1 — //e)- 

The vital feature of fuzzy neural network application to fault isolation 
is the way of obtaining the values of the weights Wf. The weights, due to the 
above description, are calculated directly on the grounds of an expert’s knowl- 
edge written in the diagnostic signals-faults relation. Such an approach is not 
possible in the case of neural networks that are black boxes. It is also possible 
to tune fuzzy neural network parameters with the use of measurement data. 
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• If the uncertainty of the relation existing between a particular signature 
and the fault fk is taken into account at the level resulting from taking 
the uncertainty of the j-th diagnostic signal symptom into consideration, 
then the fault fk should be indicated with the coefficient (1 — by 

a signature in which a questionable symptom will not appear (Fig. 11.21, 
case C). 



11.4.3. Example of fuzzy neural network application to fault isolation 

An example of fault isolation with the use of a fuzzy neural network for a 
three-tank system (Fig. 1.3) is presented below. Let us consider the same 
sets of possible faults, diagnostic signals (also residuals), as well as the rule 
base which were presented in the example in Subsection 11.3.4. Two- value 
evaluation was applied to all of the residuals. 

Based on the analysed set of faults, available set of residuals and inference 
rules, the FNN was designed and its weights were chosen. The values of 
the parameters of fuzzy residual evaluation were chosen on the grounds of 
residual value analysis in the case of a lack of faults. The network structure is 
presented in Fig. 11.22. Only such connections existing between Layers (D) 
and (E) were shown for which the weights have the value of one. The weights 
of the connections that exist between these layers but were not shown in 
Fig. 11.22 are equal to zero. 

In the network structure applied, only the signatures that are present 
in the diagnostic rules base are represented in Layer (D) of the network. 
In this case, a separate output that defines the coefficient of an unknown 
state of the system /lus was not used since the following formula is fulfilled: 
l^us = 1 - 

Figures 11.23 and 11.24 show examples of the isolation of the fault /s. 
Figure 11.23 shows changes of fault size simulation as well as the values of 
residuals vs. time. It can be seen that as the fault size grows, the values of 
the residuals V 2 to differ more and more from zero. The residual ri is 
insensitive to this simulated fault. 

Figure 11.24 shows changes of the values of the coefficients of residuals 
affiliation to diagnostic signal values that correspond to the fault /s as well 
as the following coefficients: of the certainty that the fault fs had appeared, 
of the state of complete efficiency, and of the index /is vs. time. 

When all of the diagnostic signals set at their correct values (signal N 
for the residual T 2 appeared as the last one), the coefficient of the certainty 
that the fault fs has appeared reaches its maximum value. At the same time, 
the value of the coefficient of the state of complete efficiency decreases to zero 
the first negative value of diagnostic signals, i.e., N for the residual r 4 , is set. 
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Fig. 11.22. Fuzzy neural network that implements fuzzy residual 
evaluation and fault isolation for a three-tank system 



11.5. Summary 

Fault detection algorithms in which fuzzy modelling techniques are applied 
are being constantly developed. The possibility of non-linear system mod- 
elling is a great advantage of fuzzy techniques. They allow constructing the 
models of systems on the grounds of measurement data. It is especially impor- 
tant in situations in which the analytical models of systems are not known. 
Such models map well the operation of the system within signal range changes 
on the grounds of which they were trained. If this range is sufficiently wide, 
then fuzzy models obtained for non-linear systems will be more general than 
linear ones but less general than models obtained using physical equations. 

The possibility of joining an expert’s knowledge with available measure- 
ment data is a vital advantage of fuzzy models and fuzzy neural networks. 
The expert’s knowledge is applied to defining the structure as well as the 
initial values of the parameters of the model. The model itself is not a black 
box. It is a set of rules that can be verified and modified by an expert. 
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Fig. 11.23. Changes of residuals in the case of the simulation 
of the linearly growing value of the fault /a 



The research into detection methods based on the fuzzy and neural mod- 
els described above has been carried out at the Institute of Automatic Con- 
trol and Robotics of the Warsaw University of Technology. The research was 
carried out for chosen parts of the first evaporation station at the Lublin 
Sugar Factory in Poland, with the use of measurement data obtained during 
a sugar campaign and archi vised by the monitoring system. The results of 
the experiments are described in (Koscielny et a/., 1999a; 1999b; 2000). It 
is possible to state on the basis of the experiments that all of the examined 
methods have ensured a good modelling quality for relatively simple systems. 
Detection algorithms have been sensitive to the introduced faults. 

By comparing calculation eflForts for training, it is possible to say that 
Wang and Mendel’s method allows obtaining significantly quicker models that 
have a small number of inputs, and requires the use of significantly lower 
calculation efforts than methods applied to fuzzy and perceptron network 
tuning. It also allows applying very large data sets of data during training. 
The higher the number of training data for the normal state, the lower the 
probability of missing rules. However, only a modified version of WM has 
this advantage. In the case of the original version, a high number of train- 
ing data increases the possibility of obtaining false rules in the presence of 
measurement disturbances. The highest calculation efforts are required for 
TSK models, in which, beside fuzzy neural network weights, also linear (or 
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Fig. 11.24. Fault isolation process in the case of the 
simulation of the linearly growing value of the fault fs 

non-linear) model parameters are tuned for particular rules which correspond 
to various range of system operations. These higher calculation efforts allow 
us to obtain more accurate models in comparison with simple fuzzy networks 
only in the case of models that have very simple structures. 

An appropriate choice of the structure of the model is very important, 
and in order to choose it properly, one must use both the knowledge about 
the system, as well as the knowledge that concerns the modelling techniques 
applied. It is possible to state that a proper preparation of training data 
determines to a high degree the correct operation of the fuzzy model in the 
future. However, it is necessary to supply training data that would encompass 
the whole range of system operation. Only then can the obtained model map 
well the output signal. Extrapolation past the training region may lead to 
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high modelling errors. It is therefore necessary to design adequate software 
for choosing and preparing a set of training data out of the SC AD A or DCS 
system archives. 

The number of rules in the fuzzy model grows rapidly with an increase 
in the number of inputs and the number of fuzzy sets for particular inputs. 
This limits their application to relatively simple systems. In order to lower 
the number of the input signals, their aggregation may be carried out. Neural 
models do not possess this disadvantage but even in this case it is advisable 
to use partial models of the system, as well as input variable aggregation. 

Fault isolation algorithms based of fuzzy logic possess several advantages 
in comparison with other methods described earlier. One of such advantages 
is the possibility of taking into account the uncertainties that exist in the 
process of diagnosing. 

The application of fuzzy logic to the evaluation of residuals allows us to 
take into account their uncertainty during the generation of a diagnosis. It 
ensures a higher resistance of the inference algorithm to measurement noise 
and disturbances than a diagnosis performed on the grounds of residual value 
threshold evaluation. A false test result (false diagnostic signal value) in the 
case of inference on the grounds of residual value threshold appreciation and 
classical logic may lead to a false diagnosis or causes a lack of the ability of 
formulating a diagnosis. On the other hand, a diagnosis generated with the 
use of fuzzy logic indicates possible states of the system together with the 
coefficients of their existence. Symptom uncertainties cause lower values of 
the particular state certainty coefficients but the diagnosis indicates all of the 
states that have signatures similar to the obtained diagnostic signals. 

The second kind of uncertainty, which appears more rarely in practice, 
concerns the definition of the diagnostic signals-faults relation. In such cases, 
the fuzzy diagnostic relation should be applied. Other methods, except for 
isolation algorithms based on Bayes’ theory (Koscielny, 2001), do not allow 
taking into account the uncertainty of symptoms and the diagnostic relation. 
However, the application of fuzzy logic is considerably simpler in comparison 
with probabilistic inference, and allows in a natural way making use of an 
expert’s knowledge. 

The rules of the base of knowledge may be defined directly on the basis 
of an expert’s knowledge, or they may be derived from the binary diagnostic 
matrix or the FIS. The application of fuzzy logic to multi- value fuzzy residual 
evaluation and diagnostic inference together with the rule base generated on 
the grounds of the diagnostic signals-faults relation that has the form of the 
information system has advantages particularly desirable in the diagnosing 
algorithm. Besides the above-mentioned advantages, it is also possible to 
obtain higher fault distinguishability in comparison with algorithms to which 
the binary diagnostic matrix was applied. 

Fault isolation algorithms based on fuzzy logic may be presented in the 
form of a fuzzy neural network. Measurement data and known training al- 
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gorithms used for neural networks can be applied to network tuning. Since 
a fuzzy neural network is not a black box, its structure and parameters may 
also be defined on the basis of an expert’s knowledge. This fact has a high 
practical meaning since obtaining the sequences of training data for states 
with faults is not possible in the case of many industrial systems. 

Fault isolation algorithms presented in this chapter possess an important 
feature, i.e., the ability of recognising such states of the system which are not 
consistent with the rules contained in the data base. The states are charac- 
terised by a very low value of the index //s and result, among other things, 
from faults that are not taken into account by designers, or from multiple 
faults. 
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OBSERVERS AND GENETIC 
PROGRAMMING IN THE IDENTIFICATION 
AND FAULT DIAGNOSIS 
OF NON-LINEAR DYNAMIC SYSTEMS^ 
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12.1. Introduction 

It is well known that there is an increasing demand for modern systems to 
become more eflFective and reliable. This real world development pressure has 
transformed automatic control, initially perceived as the art of designing a 
satisfactory system, into the modern science that it is today. The observed in- 
creasing complexity of modern systems necessitates the development of new 
control and supervision techniques. To tackle this problem, it is obviously 
profitable to have all the knowledge concerning system behaviour. Undoubt- 
edly, an adequate model of a system can be a tool providing such knowledge. 
Models can be useful for system analysis, e.g., to predict or to simulate system 
behaviour. Indeed, nowadays, advanced techniques for designing controllers 
are also based on system models. The application of models leads directly to 
the problem of system identification. 

The main objective of system identification is to obtain a mathemati- 
cal description of a real system of interest. In the case of phenomenological 
models, whose structures are built based on physical considerations, i.e., on 

§ Partially supported by the EU FP 5 Research Training Network DAMADICS: 
Development and Application of Methods for Actuator Diagnosis in Industrial Con- 
trol Systems (2000-2003), and within the grant of the State Committee for Scientific 
Research in Poland, KBN, No. 131/E-372/SPUB-M/5 PR UE/DZ 58/2001. 

* University of Zielona Gora, Institute of Control and Computation Engineering, 
ul. Podgorna 50, 65-246 Zielona Gora, Poland, 
e-mail: {M . Wit czak , J . Korbicz}Qissi . uz . zgora . pi 



J. Kacprzyk et al.(eds.), Fault Diagnosis 
© Springer- Verlag Berlin Heidelberg 2004 




458 



M. Witczak and J. Korbicz 



physical laws governing the system that is being studied, the system identifi- 
cation problem reduces to the parameter estimation one. Given the structure 
of such a model and knowing that its parameters have a physical meaning, it 
is possible to predict their nominal values. This possibility extremely facilities 
parameter estimation, especially for model structures which are non-linear in 
their parameters. On the other hand, the high complexity of a large majority 
of real systems makes it impossible to perform physical deliberations under- 
lying phenomenological models. In such situations, behavioural models^ which 
merely approximate the system input-output behaviour, have to be employed. 

While linear models (Ljung, 1987; Nelles, 2001; Walter and Pronzato, 
1997) may be fully acceptable in many cases, there are applications for which 
a detailed description of a system of interest is of great practical importance. 
This is the main reason for the further development of non-linear system 
identification theory. Indeed, a few decades ago, non-linear system identifica- 
tion was a field of several ad-hoc approaches, each applicable only to a very 
restricted class of systems. With the advent of neural networks, fuzzy models, 
and modern structure optimisation techniques, a much wider class of systems 
can be handled. 

The most popular classical non-linear identification methods usually em- 
ploy various kinds of polynomials as a foundation for the model construc- 
tion procedure (Billings et a/., 1989; Farlow, 1984; Ivakhnenko, 1968; Nelles, 
2001), which combines structure determination with parameter estimation. 
In spite of the considerable usefulness of such approaches, there are appli- 
cations for which polynomial models do not give satisfactory results. To 
overcome this problem, the so-called soft computing (Patton and Korbicz, 
1999; Nelles, 2001) methods can be employed. The most popular approach 
is to use either neural networks (Hertz et a/., 1991; Nelles, 2001) or fuzzy 
neural networks (Nuck et al, 1997; Nelles, 2001). Many works confirm their 
effectiveness and recommend their use. On the other hand, there is no ef- 
ficient approach to selecting the structures of such networks. Thus, many 
experiments have to be carried out to obtain an appropriate configuration. 
An alternative approach, which seems to avoid such a difficulty, is to employ 
Genetic Programming (GP) (Esparcia- Alcazar, 1998; Gray et a/., 1998; Koza, 
1992; Witczak and Korbicz, 2000a; 2000b; 2002a). GP is an extension of ge- 
netic algorithms (Michalewicz, 1996), which are a broad class of stochastic 
optimisation algorithms inspired by some biological processes, allowing pop- 
ulations of organisms to adapt to their surrounding environment. The main 
difference between these two approaches is that in GP the evolving individuals 
are parse trees rather than fixed-length binary strings. The main advantage of 
GP over neural networks is that the models resulting from this approach are 
less sophisticated (from the point of view of the number of parameters). This 
means that those models, in spite of the fact that they are of the behavioural 
type, are more transparent and hence they provide more information about 
system behaviour. Moreover, model structures resulting from this approach 
can be further reduced in a very intuitive way. 
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Unlike it has been done in the past, modern control techniques should take 
into account the system’s safety. This requirement goes beyond the normally 
accepted safety-critical systems of nuclear reactors and aircraft, where safety 
is of paramount importance, to less advanced industrial systems. Therefore, 
it is clear that the problem of fault diagnosis constitutes an important subject 
in modern control theory. This is the main reason why the design and ap- 
plication of model-based fault diagnosis has received considerable attention 
during the last few decades. 

In a fault diagnosis task, the model of a real system of interest is utilised 
to provide estimates of certain measured and/or unmeasured signals. Then, in 
the most usual case, the estimates of the measured signals are compared with 
their originals, i.e., a difference between the original signal and its estimate 
is used to form a residual signal. This residual signal can then be employed 
for Fault Detection and Isolation (FDI). This means that the problems of 
system identification and fault diagnosis are closely related. 

Regardless of the identification method used, there is always the problem 
of model uncertainty, i.e., the model-reality mismatch. Thus, the better the 
model used to represent system behaviour, the better the chance of improv- 
ing reliability and performance in diagnosing faults. Unfortunately, distur- 
bances as well as model uncertainty are inevitable in industrial systems, and 
hence there exists a pressure creating the need for robustness in fault diag- 
nosis systems. This robustness requirement is usually achieved in the fault 
detection stage. 

In the context of robust fault detection, many approaches have been pro- 
posed (Chen and Patton, 1999; Patton et al, 2000). Undoubtedly, the most 
common one is to use robust observers, such as the Unknown Input Observer 
(UIO) (Alcorta et al, 1997; Chen et al, 1996; Chen and Patton, 1999; Patton 
and Chen, 1997; Patton et a/., 2000), which can tolerate a degree of model 
uncertainty and hence increase the reliability of fault diagnosis. In such an ap- 
proach, the model-reality mismatch is represented by the so-called unknown 
input and hence the state estimate and, consequently, the output estimate are 
obtained taking into account model uncertainty. As in system identification, 
much of the work in this subject is oriented towards linear systems. This is 
mainly because the theory of observers (or filters in the stochastic case) is 
especially well developed for linear systems. 

Unfortunately, the existing non-linear extensions of the UIO (Alcorta et 
al, 1997; Chen et a/., 1996; Chen and Patton, 1999; Patton and Chen, 1997; 
Seliger and Frank, 2000) require a relatively complex design procedure, even 
for simple laboratory systems (Zolghardi et al, 1996). Moreover, they are 
usually limited to a very restricted class of systems. One way out of this 
problem is to employ linearisation-based approaches, similar to the Extended 
Kalman Filter (EKF) (Anderson and Moore, 1979). In this case, the design 
procedure is as simple as that for linear systems. On the other hand, it is well 
known that such a solution works well only when there is no large mismatch 
between the model linearised around the current state estimate and the non- 
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linear behaviour of the system. To settle this problem, the idea is to improve 
the convergence of linearisation-based observers (Witczak et al.^ 2002b), thus 
making them more useful for real-world applications. 

The chapter is organised as follows. Section 12.2 presents a few pos- 
sible approaches to the identification of discrete-time non-linear systems 
with genetic programming. In particular, identification schemes for both the 
input-output and state-space models are proposed. In Section 12.3, some 
elementary information regarding unknown input observers is recalled and 
the concept of the Extended Unknown Input Observer (EUIO) is intro- 
duced, and then the design algorithm is described in detail. Then, the pro- 
posed approaches are tested using selected examples. In particular, the input- 
output vapour model, as well as the state-space models of the valve actuator 
and the temperature at the outlet of an evaporator (the apparatus mod- 
el), is presented. All of the above examples are based on real data from 
the Lublin Sugar Factory in Poland. The main objective of the next exam- 
ple is to design a fault detection scheme for an induction motor with the 
use of the proposed observer. The last example is devoted to an industrial 
study regarding observer-based fault detection. Finally, Section 12.5 contains 
conclusions. 



12.2. Identification of non-linear dynamic systems 

12.2.1. Data acquisition and preparation 

The problem of data acquisition constitutes an important preliminary part of 
any system identification procedure, and is closely related to experiment de- 
sign (Ljung, 1987; Walter and Pronzato, 1997). In the case of a known model 
structure, the problem reduces to an appropriate selection of experimental 
conditions with respect to the parameters to be estimated (Rafajlowicz, 1989; 
1996; Ucihski, 1999; 2000; Walter and Pronzato, 1997). In the present work, 
the model structure is assumed to be unknown and hence the experimental 
conditions should be chosen in such a way so as to provide maximum in- 
formation about the system’s input-output behaviour. This is, of course, a 
very sophisticated problem and the reader is referred to (Ljung, 1987; Walter 
and Pronzato, 1997) and the references therein for further explanations. It 
should be also pointed out that the experiment designing procedure is usually 
limited owing to reality constraints, e.g., in process industry, it may be not 
allowed at all to manipulate a system in the production mode. 

Data collected in a physical plant are usually not in the form which is 
appropriate for use in the model construction procedure. This is mainly be- 
cause of high- and low-frequency disturbances, offset levels, outliers, etc. In 
order to overcome such problems, the approaches described in (Ljung, 1987) 
can be employed. In the case of readers familiar with the MATLAB System 
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Identification Toolbox (Ljung, 1988), the problem reduces to simply applying 
appropriate procedures. 

The data, collected and appropriately prepared for the model construction 
procedure, are usually divided into two sets, namely, the identification {ut 
input-output measurements) data set and the validation {n^ input-output 
measurements) data set. The former is used to obtain the model structure 
and to estimate its parameters, while the latter is employed to evaluate the 
goodness of fit of the identified model by making a comparison between the 
system output and the predicted model output either visually or by some 
formal distance measure. It should be pointed out that there exist many 
more or less sophisticated approaches to model validation. However, model 
testing by experimental data {cross-validation) is a quite good technique, and 
so it is often used in practice. 

12.2.2. Model selection criteria 

Let M {Mi, i = 1, . . . ,nm} be a set of model structures that compete 
for the description of the same data. It corresponds to structures of different 
types and complexity. With each of these structures a parameter vector is 
associated. It is assumed that the most complex structure is that containing 
the greatest number of parameters. Once the set of model structures has been 
selected, the problem is to choose the best possible model. The criterion for 
this task is usually based on some scalar measures (cost functions) . Such cost 
functions are usually obtained using the difference between the system output 
measurement and the model output y. The output measurement consists of 
the true system output y and the noise v, i.e., it is assumed that the output 
measurement equals {y + v). 

In a probabilistic framework the expectation of the squared difference 
between the system output measurement and the model output can be used 
as a cost function, i.e., 

£{{y + v-y)^} = £{{y - y)'^} + 2£{{y-y)v} +£ . (12.1) 

Assuming that the noise v is uncorrelated with the model and system out- 
puts, the equation (12.1) becomes 

£{{y + v-y)‘^} = £{{y -y)'^} +£ {v'^} . (12.2) 

It is obvious that the term cannot be minimised as it constitutes 

the variance of the measurement noise. Thus the cost function reaches the 
minimum if y = y. 

The model error £{{y — y)"^} can be further decomposed as 

£{{y Hy}) -iy- m)f} 

= {y-£{y]f + ^{{y-^{y]f]- 



(12.3) 
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The two terms constituting the model error, i.e., {y — S{y]Y var(^) = 
£{{y — are called the bias and variance errors, respectively. 

The bias error is caused by the restricted flexibility of the model. In 
practice most systems are quite complex, and the class of models typically 
applied is not capable of representing the system correctly. The only exception 
occurs when the true structure of the system is known, i.e., it is obtained as 
a result of the physical deliberations underlying the system being studied. 
Therefore, the bias error decreases as the model complexity increases. Since 
the model complexity is related to the number of parameters, the bias error 
depends qualitatively on it. From the above it is clear that the bias error 
represents a systematic deviation between the system and the model that 
exists due to the model structure. 

The variance error is caused by the deviation of the estimated parame- 
ters from their optimal values. Indeed, since model parameters are estimated 
based on flnite and noisy data, they usually deviate from their optimal val- 
ues. The variance error describes that part of the model error which comes 
from parameter uncertainty. Undoubtedly, the fewer parameters the model 
possesses, the more accurately they can be estimated using the identiflcation 
data. Thus, the variance error increases accordingly to an increase in the 
number of parameters. 

From the above discussion it is clear that a compromise between the bias 
and the variance error should be established (bias/ variance trade-off). This 
can be achieved by an appropriate selection of model complexity. 

It can be observed during model determination that the error on the iden- 
tiflcation data (which is approximately equal to the bias error) decreases as 
the model complexity increases. On the other hand, the error on the valida- 
tion data (which is equal to the bias error plus variance error) starts to rise 
again above the point of optimal complexity. If this effect is ignored, it may 
lead to a model that is either overly complex {overfitting (low bias, high vari- 
ance)) or too simple {underfitting (high bias, low variance)). In order to avoid 
overfitting, a complexity penalty term should be introduced. This results in 
information criteria that reflect the value of a cost function and the complex- 
ity. There are, of course, many different information criteria, e.g., Akaike’s, 
Bayesian, Khinchin’s law of iterated logarithm criterion, the final prediction 
error criterion, structural risk minimisation (Nelles, 2001). However, they all 
in one way or another implement the following structure: 

INFORMATION CRITERIA = IC(COST FUNCTION, MODEL COMPLEXITY). 

The introduction of such criteria makes it possible to avoid data splitting, 
i.e., dividing the data set into the identiflcation and validation data sets. In 
this case, the entire data set can be used for parameter estimation. This is 
especially important when only a small data set is available. On the other 
hand, it seems profitable to use, if possible, such criteria together with the 
validation data set. 
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The Akaike Information Criterion (AIC) (Walter and Pronzato, 1997) is 
one of the best known criteria, which can be employed to select the model 
structure and to estimate its parameters: 

= arg min min Jaic (-^^(p')) . (12.4) 

V / MiGM 

In order to formulate a detailed description of (12.4), it is convenient to 
assume that the data satisfy 

yk=Vk{M*(p*)) +Vk, k = l,...,nt, (12.5) 

where M* denotes the correct model structure, p* is the true value of its 
parameters, and is a sequence of random independent variables assumed 
to be normally distributed. The determination of M can thus be realised, 
similarly to the way it was done in (Walter and Pronzato, 1997), as follows: 

Using the validation data set obtain 

M = arg min Jm(Mi), (12.6) 

MiGM 

Jm(Mi) = b(Mi(p')) + - dimp', (12.7) 

Z Tit 

where 

p* = arg mini (Mj(p*)), * = l,...,n„, (12.8) 

are obtained using the identification data set {{yki'^k)}'k=o '- 

nt — \ 

j{Mi{p'-)) = Indet ^ £fc(Mi(p®))ef (Mi(p*)), (12.9a) 

A ;=0 

£k =yk~ VkiMiip')), (12.9b) 

and, consequently, p*, which corresponds to the best model structure M = 
Mi^ is chosen as p. 

The model determination process can then be realised as follows: 

Step 0: Select the set of possible model structures M. 

Step 1: Estimate the parameters of each of the models Mi, i = 1,. , rim, 
according to (12.9a). 

Step 2: Select the model which is best suited in terms of the criterion (12.6). 

Step 3: If the selected model does not satisfy the prespecified requirements, 
then go to Step 0. 
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12.2.3. Input-output representation of the system 

The characterisation of a set of possible candidate models M (cf. Sec- 
tion 12.2.2) from which the system model will be obtained constitutes an 
important preliminary task in any system identification procedure. Knowing 
that the system exhibits a non-linear characteristic, the choice of a non-linear 
model set must be made. Let a non-linear Multi-Input and Multi-Output 
(MIMO) model have the following form: 

Vi,k — 9i{yi,k—lj • ’ • :yi,k—ni^y j • • • •)ym,k — lj • ‘ • ■)ym,k—nm,y ^ 

— 1 5 • • • 5 '^l,k—ni,u ? • • • 5 '^r^k — l ? • • • 5 '^r,k—rir,u iPi) ^ 

i = (12.10) 



Thus the system output is given by 

yk = Vk + ^k, ( 12 . 11 ) 

where Sk consists of a structural deterministic error, caused by the model- 
reality mismatch, and the measurement noise Vk- The problem is to de- 
termine an unknown function g{-) = ^ 9m{')) and to estimate the 

corresponding parameter vector p — (p^, . . . ,p^). 

One possible solution to this problem is the GP approach. As has already 
been mentioned, the main ingredient underlying the GP algorithm is a tree. 
In order to adapt GP to system identification, it is necessary to represent 
the model (12.10) either as a tree or as a set of trees. Indeed, as shown in 
Fig. 12.1, the Multi-Input and Single-Output (MISO) non-linear model can 
be easily put in the form of a tree, and herice to build the MIMO model 
(12.10) it is necessary to use m trees. In such a tree (see Fig. 12.1), two 
sets can be distinguished, namely the terminal T set and the function F 
set (e.g., T = {uk-i,Uk- 2 ,yk-i,yk- 2 }, F = {+,*,/})• The language of the 
trees in GP is formed by the user-defined function F and the terminal T 
set, which form the nodes of the trees. The functions should be chosen so as 
to be a priori useful in solving the problem, i.e., any knowledge concerning 
the system under consideration should be included in the function set. This 




Fig. 12.1. Exemplary tree representing the model yk = yk-iUk-i + yk- 2 /uk 



2 
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function set is very important and should be universal enough to be capable 
of representing a wide range of non-linear systems. For the model structure 
determination problem it is natural to include the four ordinary arithmetic 
operations, i.e., F = {-h,— ,*,/}, mainly because these operators allow the 
creation of polynomials as well as the quotients of such polynomials. The 
choice of other operators and functions depends on the application and should 
be performed after a suitable analysis of system behaviour. 

Terminals are usually variables or constants. Thus, the searching space 
consists of all possible compositions that can be recursively formed from the 
elements of F and T. The selection of variables does not cause any problems, 
but the handling of numerical parameters (constants) is very difficult. Even 
though no constant numerical values are in the terminal set T, they can be 
implicitly generated, e.g., the number 0.5 can be expressed as xl{x^-x). Un- 
fortunately, such an approach leads to an increase in both the computational 
burden and evolution time. Another way is to introduce a number of random 
constants into the terminal set, but this is also an inefficient approach. An 
alternative way of handling numerical parameters, which is more suitable, is 
called node gains (Esparcia- Alcazar, 1998). A node gain is a numerical pa- 
rameter associated to the node, whose output it multiplies (see Fig. 12.2). 




Fig. 12.2. Exemplary tree 



Although this technique is straightforward, it leads to an excessive number 
of parameters, i.e., there are parameters which are not identifiable. Thus, 
it is necessary to develop a mechanism which prevents such situations from 
happening. First, let us define the function set F = {+, *, /, ^i(-)j • • • ? 6 (’)}j 
where ^k{') is a non-linear univariate function. To tackle the parameters 
reduction problem, a few simple rules can be established: 

*,/: A node of the type * or / always has parameters set to unity on the 
side of its successors. If a node of the above type is a root node of a 
tree, then the parameter associated with it should be estimated. 

+: A parameter associated with a node of the type + is always equal to 

unity. If its successor is not of the type +, then the parameter of the 
successor should be estimated. 
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If a successor of a node of the type ^ is a. leaf of a tree or is of the type 
* or /, then the parameter of the successor should be estimated. If a 
node of the type ^ is a root of a tree, then the associated parameter 
should be estimated. 

As an example, consider the tree shown in Fig. 12.2. Following the 
above rules, the resulting parameter vector has only five elements p — 
iP 3 ,Ps,P 9 ,Pio,Pn), and the resulting model is yk = (pn + Ps)Vk-i + (pio + 
pg)uk-i + Psyk-i/Uk-i- It is obvious that although these rules are not op- 
timal in the sense of parameter identifiability, their application significantly 
reduces the dimension of the parameter vector, thus making the parameter 
estimation process much easier. Moreover, the introduction of parameterised 
trees reduces the terminal set to variables only, i.e., constants are no longer 
necessary, and hence the terminal set is given by 

^ ~ \yi,k — lj • • • — ni,yj • • • ? ym,k — l 5 • • • 5 ym^k — nm,y 5 

— 1 5 • • • 5 ^l,k—ni^u 5 • • • 5 '^r,k — l 5 • • • 5 '^r,k—rir,u } • 

The remaining problem is to select appropriate lags in the input and out- 
put signals of the model. Assuming that i are the maximum 

lags in the output and input signals, the problem boils down to checking 
X possible configurations, which is an extremely time-consuming 
process. With a slight loss of generality, it is possible to assume that each 
riy = riu = n. Thus the problem reduces to finding, throughout experiments, 
such n for which the model is the best replica of the system. It should also 
be pointed out that a good starting point for the input and output lags 
is to obtain them according to the approach presented by Nelles (2001, 
pp. 574-576). 

12.2.4. Tree structure determination using GP 

If the terminal and function sets are given, the populations of GP individu- 
als (trees) can be generated, i.e., the set M of possible model structures is 
created. An outline of the GP algorithm implemented in this work is shown 
in Table 12.1. The algorithm works on a set of populations P == {Pi | i = 
1, . . . ,np}, and the number of populations rip depends on the application, 
e.g., in the case of the model (12.10) the number of populations is equal to 
the dimension m of the output vector i.e., rip == m. Each of the above 
populations Pi = {bij | j = l,...,rim} is composed of a set of rim trees 
bij. Since the number of populations is given, the GP algorithm can be start- 
ed {initiation) by randomly generating individuals, i.e., Um individuals are 
created in each population whose trees are of a desired depth n^. 

The tree generating process can be performed in several different ways, 
resulting in trees of different shapes. The basic approaches are the full and 
grow methods (Koza, 1992). The full method generates trees for which the 
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Table 12.1. Outline of the GP algorithm 

I. Initiation 

A. Random generation P(0) = {Pi(0) | i = 1, . . . , Up}. 

B. Fitness calculation 4>(P(0)) = {^(Pi(0)) | i = 1, . . . , Up}. 
a t = i. 

II. While {t{F{t)) = true repeat 

A. Selection F\t) = {Pl{t) = Sn^ (Pi{t)) \ i = Up} . 

B. Crossover P"(t) = {P/'(t) = rp^^^^^{P'{t)) | i = 1, . . . , rip}. 

C. Mutation F'"{t) = {P/"(t) = mp^^,{Pl' (t)) | i = 1, . . . , Up}. 

D. Fitness calculation $(P'"(t)) = {^(P/"(t)) | i = 1, . . . , rip}. 

E. New generation 

P(t + 1) = {Pi{t + 1) = P'"(t) \i = l ,.. . ,np}. 

F. t = t + 



length of every nonbacktracking path from the root to an endpoint is equal 
to the prespecified depth rid- The grow method generates trees of various 
shapes. The length of the path between the root and the endpoint is not 
greater than the prespecified depth n^. Because of the fact that, in general, 
the shape of the true solution is unknown, it seems desirable to combine 
both of the above methods. Such a combination is called ramped half-and- 
half . Moreover, it is assumed that the parameters p — (p^, . . . ,p^) of each 
tree are initially set to unity (although it is possible to set the parameters 
randomly) . 

In the first step {fitness calculation)^ the estimation of the parameter vec- 
tor p of each individual is performed according to (12.9a). In the case of 
parameter estimation, many algorithms can be employed; more precisely, as 
GP models are usually non-linear in their parameters, the choice reduces to 
one of non-linear optimisation techniques. Unfortunately, because the mod- 
els are randomly generated, they can contain linearly dependent parameters 
(even after the application of parameter reduction rules) and parameters 
which have very little infiuence on the model output. In many cases, this 
may lead to a very poor performance of gradient-based algorithms. 

Owing to the above-mentioned problems, the spectrum of possible non- 
linear optimisation techniques reduces to gradient-free techniques, which usu- 
ally require a large number of cost evaluations. On the other hand, the appli- 
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cation of stochastic gradient-free algorithms, apart from the simplicity of the 
approach, decreases the chance to get stuck in a local optimum, and hence it 
may give more suitable parameter estimates. Based on numerous computer 
experiments, it has been found that the extremely simple Adaptive Random 
Search (ARS) algorithm (Walter and Pronzato, 1997) is especially well suited 
for that purpose. The routine chooses the initial parameter vector e.g., 

= 1. After q iterations, given the current best estimate p^, a random 
displacement vector Ap is generated and the trial point p* = p^ -f Ap 
is checked, with Ap following a normal distribution with a zero mean and 
covariance S = diag[cri , . . . , (jdim p] If j(Af(p*)) > j(M(p^)), then p* is 
rejected and, consequently, p^+^ = p^ is set; otherwise, = p* The 

adaptive strategy consists in repeatedly alternating two phases. During the 
first one {variance selection)^ 5] is selected from the sequence V, . . . ,^cr, 
where V is set by the user in such a way so as to allow an easy exploration 
of the parameter space, and V cr/10, z = 2,...,5. In order to allow a 
comparison to be drawn, all the possible V’s are used lOO/z times, starting 
from the same initial value of p. The largest V’s, designed to escape the local 
minimum, are therefore used more often than the smaller ones. During the 
second {exploration) phase, the most successful V is used to perform 100 
random trials starting from the best p obtained so far. 

In the next step, using (12.7), the fitness of each model is obtained and 
the best-suited model is selected with the use of (12.6). If the selected model 
satisfies the prespecified requirements, then the algorithm is stopped. In the 
second step, the selection process is applied to create a new intermediate 
population of parent individuals. For that purpose, various approaches can 
be employed, e.g., proportional selection, rank selection, tournament selection 
(Koza, 1992; Michalewicz, 1996). The selection method used in the present 
work is tournament selection, and it works as follows: select randomly Ug 
models, i.e., trees which represent the models, and copy the best of them 
into the intermediate set of models (intermediate populations). The above 
procedure is repeated Um times. 

Individuals for the new population (the next generation) are produced 
through the application of crossover and mutation. To apply crossover , 

random couples of individuals which have the same position in each popu- 
lation are formed. Then, with the probability Pcross? each couple undergoes 
crossover, i.e., a random crossover point (node) is selected and then the corre- 
sponding sub-trees are exchanged (Fig. 12.3). Mutation (Fig- 12.4) is 

implemented such that for each entry of each individual a sub-tree at a select- 
ed point is removed with the probability Pmut and replaced with a randomly 
generated tree. The parameter vectors of individuals which have been modi- 
fied by means of either crossover or mutation are set to unity (although other 
choice is possible), and the other node parameter vectors remain unchanged. 
The GP algorithm is repeated until the best-suited model satisfies the pre- 
specified requirements ^(P(t)), or until the number of maximum admissible 
iterations has been exceeded. 
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It should also be pointed out that the simulation programme must en- 
sure robustness to unstable models. This can be easily attained when (12.9a) 
is bounded by a certain maximum admissible value. This means that each 
individual which exceeds the above bound is penalised by stopping the cal- 
culation of its fitness, and then Jm{Mi) is set to a sufficiently large positive 
number. This problem is especially important in the case of the input-output 
representation of the system. Unfortunately, the stability of models result- 
ing from this approach is very difficult to prove. However, this is a common 
problem with non-linear input-output dynamic models. To overcome it, an 
alternative model structure is presented in the subsequent section. 
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12.2.5. State-space representation of the system 

Let us consider the following class of non-linear discrete-time systems: 



Xk+i = g{xk,Uk) + Wk, 


(12.12a) 


“ ^Xk-\-l H“ Vk’ 


(12.12b) 


Assume that the function g{’) has the form 




g{Xk')'^k) ~ A{Xk)Xk “f* hi{uk^- 


(12.13) 


The choice of the structure is caused by the fact that the resulting model 
is to be used in FDI systems. The algorithm presented below though can also, 
with minor modifications, be applied to the following structures of g{‘)\ 


g{xk,Uk) = A{xk,Uk)xk, 


(12.14) 


g{xk, Uk) = A(xk,Uk)xk + h{uk), 


(12.15) 


g{xk,Uk) = A{xk,Uk)xk + B{xk)uk, 


(12.16) 


giXkiUk) — A{Xk')Xk "i" B{Xk^'^k' 


(12.17) 



The state-space model of the system (12.12a)-(12.12b) can be expressed as 



Xfc+i = A{xk)xk + h{uk), (12.18a) 

Vk+i = Cxk+i. (12.18b) 

The problem is to determine the matrices A(-), C and the vector 
h{')^ given the sets of input-output measurements {{^k^yk)}T=Q 
{(^fcj 2/jfc)}^Lo^. Moreover, it is assumed that the true state vector Xk is, 
in particular, unknown. Without loss of generality, it is possible to assume 
that 

A{xk) = diag[ai,i(ifc), a2,2(ifc), - • • , „(&*)] • (12.19) 

Thus, the problem reduces to identifying the non-linear functions 
cii,i{xk)^ hi{uk)^ i = l,...,n, and the matrix C. Now it is possible to es- 
tablish the conditions under which the model (12. 18a)- (12. 18b) is globally 
asymptotically stable. 

The following theorem is based on the theorems presented by Bubnicki 

( 2000 ). 

Theorem 12.1. If, for h{uk) = 0, 

V A: > 0, Mxk^^, max |ai,i(xA;)| < 1, (12.20) 

2=1,. ..,n 

then the model (12.18a)-(12.18b) is globally asymptotically stable, i.e., Xk 
converges to the equilibrium point x* for any xq. 
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Proof. Since the matrix A(xk) is a diagonal one, then 

||A(*fc)|| = max |Aj(A(x*))| = max \ai^i{xk)\, (12.21) 

where the norm ||A(-)|| may have one of the following forms: 

||A(.)||2 = ^A„ax(A(-)^A(.)), (12.22) 

n 

ll^(•)lll = (12.23) 

Ki <n 

j=l 

n 

ll^dlloo = laM'Ol- (12.24) 

l<3<n ^ ' ' 
i=l 

Finally, using (Bubnicki, 2000, Proof of Theorem 1) yields the condition 

( 12 . 20 ). ■ 



As the stability conditions are established, it is possible to give 
a general framework for the identification of (12.18a)-(12.18b). Since 
hi{uk), i — 1 , . . . ,n, are assumed to be non-linear (in general) func- 
tions, it is necessary to use n populations to represent ai^i{xk)^ i == 1 , . . . , n, 
and another n populations to represent hi{uk), i = 1, . . . ,n. Thus the num- 
ber of populations is rip = 2n. The terminal sets for these two kinds of 
populations are different, i.e., the first terminal set is defined as = {xk}^ 
and the second one as Th = {uk}- The parameter vector p consists of the 
parameters of both ai^i{xk) and hi{uk). Unfortunately, the estimation of p 
is not as simple as in the input-output representation case. This means that 
checking the trial point in the ARS algorithm (see Section 12.2.4) involves 
a computation of C, which is necessary to obtain the output error £k and, 
consequently, the value of the fitness function (12.6). To tackle this problem, 
for each trial point p it is necessary to first set an initial state estimate &05 
and then to obtain the state estimate Xk, fc = 1, . . . , — 1. Knowing the 

state estimate and using the least-squares method, it is possible to obtain C 
by solving the following equation: 



nt — l nt—l 

c Y ^kxi = Y ’ 

k=0 k=0 



or, equivalently, by using 



nt—l 



[nt-l 



-1 -1 






Y 



k=0 



lk=0 



(12.25) 



(12.26) 



Since the identification procedure of (12.18) is given, it is possible to estab- 
lish the structure of A(-), which guarantees that the condition of Theorem 1 
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is always satisfied, i.e., maxi=i,...,n \ai,i{xk)\ < 1- This can be easily achieved 
with the following structure of ai^i{xk)- 

di^iixk) = tanh i = 1, . . . ,n, (12.27) 

where tanh(-) is a hyperbolic tangent function, and Si^i{xk) is a function 
represented by the GP tree. It should be also pointed out that the order n of 
the model is in general unknown and hence should be determined throughout 
experiments. 



12.3. Unknown input observers 

Regardless of the identification method used, there always exists the problem 
of model uncertainty, i.e., the model-reality mismatch. To overcome it, many 
approaches have been proposed (Chen and Patton, 1999; Patton et a/., 2000). 
Undoubtedly, the most common one is to use robust observers, such as the 
unknown input observer (Alcorta et al, 1997; Chen and Patton, 1999; Chen 
et al.^ 1996; Patton and Chen, 1997; Patton et a/., 2000), which can tolerate 
a degree of model uncertainty and hence increase the reliability of fault di- 
agnosis. Unfortunately, the design procedure for Non-linear Unknown Input 
Observers (NUIOs) (Alcorta et aL, 1997; Seliger and Frank, 2000) is usually 
very complex, even for simple laboratory systems (Zolghardi et al, 1996). 
One way out of this problem is to employ linearisation-based approaches, 
similar to the extended Kalman filter (Anderson and Moore, 1979). In this 
case, the design procedure is almost as simple as that for linear systems. On 
the other hand, it is well known that such a solution works well only when 
there is no large mismatch between the model linearised around the current 
state estimate and the non-linear behaviour of the system. Thus, the problem 
is to improve the convergence of linearisation-based observers. 

The application of the EKF to the state estimation of non-linear determin- 
istic systems has received considerable attention during the last two decades 
(see Boutayeb and Aubry, 1999, and the references therein). This is mainly 
because the EKF can be directly applied to a large class of non-linear systems. 
Moreover, it is possible to show that the convergence of such a deterministic 
observer is ensured under certain conditions. 

The main objective of further investigations is to show how to employ 
a modified version of the well-known UIO which can be applied to linear 
stochastic systems to form a non-linear deterministic observer. Moreover, it 
is shown that the convergence of the proposed observer is ensured under 
certain conditions (Witczak, 2001; 2001b; Witczak et a/., 2002b), and that 
the convergence rate can be dramatically increased, compared to the classical 
approach, by the application of the genetic programming technique (Witczak 
and Korbicz, 2001a; Witczak et a/., 2002b). Moreover, it is shown how to 
use the proposed observer to tackle the problem of both sensor and actuator 
fault diagnosis. 
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12.3.1. Preliminaries 

This section presents a special version of the UIO, which can be employed to 
tackle the fault detection problem of linear stochastic systems. Following a 
common nomenclature, such UIO will be called a Unknown Input Filter (UIF). 
Let us consider the following linear discrete-time system: 



Xk-\-i — -^k^k "b BkUk "b E^dk "b Li^kf k "^ki (12.28a) 

= CkXk + L2,k+ifk+i + (12.28b) 

In this case, Vk and Wk are independent zero-mean white noise sequences. 
The matrices Ak, Bk^ Ck-> Ek are assumed to be known and have ap- 
propriate dimensions. As has already been mentioned, robustness to model 
uncertainty and to other factors which may lead to unreliable fault detection 
is of great importance. In the case of the UIF, the robustness problem is 
tackled by introducing the concept of the unknown input dk ; hence the term 
Ekdk may represent various kinds of modelling uncertainty, as well as real 
disturbances affecting the real system. 

To overcome the state estimation problem of (12.28a)-(12.28b), a UIF 



with the following structure can be employed: 

^k-\-i — E k-{-i^k ~b T k-{-iB kUk “b K k-\-iy k') (12.29) 

Xk+i ^ Zk-^-1 + Hk+iyk-^1, (12.30) 

where 

E^k+l = E^l,k-\-l + K2,k-\-l, (12.31) 

Ek = Hk^i Ck-\-iEk, (12.32) 

Tk-^i — I — Hk-{-iCk-\-i, (12.33) 

Ek-\-i = Tk-\-iAk — Ki^k-\-iCk- (12.34) 



The above matrices are designed in such a way so as to ensure unknown input 
decoupling as well as the minimisation of the state estimation error: 



^k-\-i — Xk-\-i Xk-^i- (12.35) 

It should also be pointed out that the necessary condition for the existence 
of a solution to (12.32) is ra,nk{Ck-{-iEk) = rank(F?jfc) (Chen and Patton, 
1999, p. 72, Lemma 3.1), and a special solution is 

= Ek[{Ck+iEkfCk+iEk]~\Ck+iEkf. (12.36) 

If the conditions (12.31)-(12.34) are fulfilled, then the fault-free, i.e., fj^ = 0, 
state estimation error is given by 



Cfc+i = Fk-\-iek - Ki^k+iVk - H k-\-i'^ k-\-i H- Tk-\-iWk- 



(12.37) 
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In order to obtain the gain matrix Ki^k+i^ let us first define the state 
estimation covariance matrix: 

Pk=S{[xk-Xk][xk-Xk]^). (12.38) 

Using (12.37), the update of (12.38) can be defined as 

P k+l = ^l,k-\-lPk^^^k-\-l ~^Pk+lQk,Plz+l 

4- H k+iPk+iH'lj^-^ - Ki CkPkAl ,fc+i ~ Ai^k+i 
X PkClKlf,^,+ K,,k+i[CkPkCl + Rk]Kl^+i, (12.39) 

where 

Tk+i^k- (12.40) 

To give the state estimation error Sk+i the minimum variance, it can be 
shown that the gain matrix Ki^k+i should be determined by 

K^^k+i = Ai^k+iPkCl[CkPkCl + Rk]~\ (12.41) 

In this case, the corresponding covariance matrix is given by 

Pk+l — ^l,k+lP'k-{-l^,k+l 

-f T k+iQkPlz+i + (12.42) 

P k+l ~ Pk ~ I^l,k+lC kP k^l^k-\-l' (12.43) 

The above derivation is very similar to that which has to be performed 
for the classical Kalman filter (Anderson and Moore, 1979). Indeed, the UIF 
can be transformed to a KF-like form as follows: 

^k-\-l — -^k^k 4" BkUk k+1^ k-\-l 4“ BkUk'l 



P^l,k+l^k^k Pk-i-lHkVk 

4- [Ki^k+1 + F k+iB[k]yk 4- Hk-hiVk+i^ (12.44) 

or 

Xk-\-i = ^k-\-i/k 4- Hk+iSk^i/k 4- Ki^k-{-i€k-! (12.45) 

where 

^k+l/k ~ -^k^k 4" BkUkj (12.46) 

^k+l/k = JJk+1 ~ Vk-hl/k — Vk+1 ~ ^k+l^k+l/k^ (12.47) 

^k=yk-Vk- (12.48) 



The above transformation can be performed by substituting (12.30) into 
(12.29), and then using (12.33) and (12.34). As can be seen, the structure 
of the observer (12.45) is very similar to that of the Kalman filter. The only 
difference is the term Hk-\-iek-{-i/ki which vanishes when no unknown input 
is considered. 
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12.3.2. Extended unknown input observer 

As has already been mentioned, the application of the EKF to the state 
estimation of non-linear deterministic systems has received considerable at- 
tention during the last two decades (see Boutayeb and Aubry, 1999, and the 
references therein). This is mainly because the EKF can be directly applied 
to a large class of non-linear systems, and its implementation procedure is 
almost as simple as that for linear systems. Moreover, in the case of deter- 
ministic systems, the instrumental matrices Rk and Qk can be set almost 
arbitrarily. This opportunity makes it possible to use them to improve the 
convergence of the observer, which is the main drawback to linearisation- 
based approaches. 

This section presents an extended unknown input observer for a class 
of non-linear systems which can be modelled by the following equations 
(Witczak, 2001; Witczak and Korbicz, 2001a; 2001b; Witczak et al, 2002b): 

Xk-^i — d{xk) h(uk + Li^kfk^ ^k^k-i (12.49a) 



Vk+I = Ck+iXk+i + L2,k+ifk+i^ (12.49b) 

where g{xk) is assumed to be continuously differentiable with respect to Xk> 
Similarly to the EKF, the observer (12.45) can be extended to the class of 
non-linear systems (12.49). The algorithm presented below though can also, 
with minor modifications, be applied to a more general structure. Such a 
restriction is caused by the need to employ it for FDI purposes. This leads 
to the following structure of the EUIO: 



^k+l/k — di^k) 4 " h(Uk\ 



(12.50a) 



Xk+i — XkJ^i/k + Hk^iSk-\-i/k + Ki^k^iCk. (12.50b) 



It should also be pointed out that the matrix Ak used in (12.40) is now 
defined by 



^k 



dg{xk) 



dxk 






(12.51) 



12.3.3. Convergence of the EUIO 

In this section the Lyapunov approach is employed for the convergence anal- 
ysis of the EUIO. The approach presented here is similar to that described in 
(Boutayeb and Aubry, 1999), which was used in the case of the EKF-based 
deterministic observer. The main objective of this section is to show that 
the convergence of the EUIO strongly depends on the appropriate choice of 
the instrumental matrices Rk and Qj^. Subsequently, the fault-free mode is 
assumed, i.e., = 0. 

For notational convenience, let us define the a priori state estimation 
error: 

^k+l/k — ^k-{-l ^k-\-l/k’ (12.52) 
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Substituting (12.49) and (12.50b) into (12.35), one can obtain the following 
form of the state estimation error: 

^k+l = ^k+l/k — Hk+lSk+l/k — I^l,k+l^k- (12.53) 

As usual, to perform further derivations, it is necessary to linearise the 
model around the current state estimate Xk- This leads directly to the clas- 
sical approximation: 

^k+l/k ^ + ^kdk- (12.54) 

In order to avoid the above approximation, the diagonal matrix ock — 
diag(ai,fc, . . . ,an,k) is introduced, which makes it possible to establish the 
following exact equality: 



^k+i/k = otkAkek + Ekdk, (12.55) 

and hence (12.53) can be expressed as 

Gfc+i = ^k^xjk - - Ki^k+iCkek 

= - H k-\-iC k+i] [akAkSk + Ekdk] - Ki^k-hi^k^k 

— [Tk-\-iOikAk - Ki^k+iCk]ek> (12.56) 

The main objective of further consideration is to determine the condi- 
tions under which the sequence {14}^^, defined by the Lyapunov candidate 
function, i.e., 

^k-j-i = A^f^_^-^ek-\-i, (12.57) 

is a decreasing one. It should be pointed out that the Lyapunov function 
(12.57) involves a very restrictive assumption regarding an inverse of the 
matrix Indeed, from (12.40) and (12.33), (12.32) it is clear that the 

matrix is singular when Ek 7 ^ 0. Thus, the convergence conditions 

can be formally obtained only when Ek = 0. This means that the practical 
solution regarding the choice of the instrumental matrices Qj^ and Rk can 
be obtained when Ek = 0 and generalised to other cases, i.e., when Ek ^ 0. 

First, let us define an alternative form of K\^k and the inverse of 
Substituting (12.43) into (12.42) and then comparing it with (12.39), one can 
obtain 

A\^k+lI^l,k-^lC kP kA^j^j^^ = Ki^k+iCkPk- (12.58) 

Next, from (12.58), (12.43) and (12.41), we have that the gain matrix is 

K^,k+i=A^,k+iP'k+iClRt- (12.59) 

Similarly, using the matrix inversion lemma, from (12.58) and (12.43) we have 
that the inverse of P'k^i is 

=p-,^ + ClRl^Ck. 



(12.60) 
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Substituting (12.56) and then (12.59) and (12.60) into (12.57), the Lya- 
punov candidate function is 

Vk+i = el ^AloLkAl'^ A-j^^otkAk 

+ AloLkAl'^ClRl^CkAl^ockAk 

- AldkAl'^ClRl^Ck - ClR-^^CkAl^ockAk 

+ ClRl^CkP'k+,ClR-;^Ck] Bk. (12.61) 

Let 

G = Al^akAk, and L^ClRl^Ck- (12.62) 

Then 

G^LG - G^L - LG = [G'^ - I]L[G - I] - L. (12.63) 

Using (12.63) and (12.41), the expression (12.61) becomes 

Vk+i = el ^AlockAl'^ Pl^ Al^OLkAk 

+ [Al(XkA-J - l]ClRl^Ck [Al^dkAk - I] 

-ClR^^ [l - CkPkCl[CkPkCl + Rk]~']]ek. (12.64) 
Using the identity in (12.64): 

I = [CkPkCl + Rk] [CkPkCl + Rk]~\ (12.65) 

the Lyapunov candidate function can be written as 

14+1 = el lAlakA^'^p-^^A^^akAk 



+ [AlakA-^ - l]ClR^^Ck [A^^akAk - I] 
-Cl[CkPkCl + Rkr"Ck]ek. (12.66) 

The sequence {14 is a decreasing one when there exists a scalar 
0 < C < Ij such that 

T4+1 - (1 - C)Vk < 0. (12.67) 

Using (12.66), the inequality (12.67) becomes 

Vfc+i - (1 - OVk = elXkBk + elYkBk < 0, (12.68) 

where 

Xfc = AlockA^^P^^A^^ockAk - (1 - C)ArJ[P'fc]-'^r:L (12-69) 

Yk - [AlakA^^ - l]ClR^^Ck [A^^ockAk - l] 

-Cl[CkPkCl + Rkr'Ck. 



(12.70) 
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and 



In order to satisfy (12.68), the matrices Xk and Yk should be semi- 
negative defined. This is equivalent to 

^ ((1 - C)A;;[[P',]-^ A~;,) , (12.71) 

a {[Al<XkA^^ - l]ClR^^Ck[A^^ockAk - /]) 

< a (cl [CkPkCl + Rk] ~"Ck) , (12.72) 

where a {■) and a{') denote the maximum and minimum singular values, 
respectively. The inequalities (12.71) and (12.72) determine the bounds of the 
diagonal matrix cxk, for which the condition (12.68) is satisfied. The objective 
of further analysis is to obtain a more convenient form of the above bounds. 
Using the fact that 

a (AlakAl’^Pl^Al^ockAk) < {Ak) (A^ (a*) a (P^^) 

_ {Ak) (ak) 



S? (Ak) a.{Pk) ’ 



(12.73) 



the expression (12.71) becomes 



s:(Ak) ( (i-OoLiPk) 

<7 (Ofc) < 71 = 

V?(Ai.,PUh) 



(12.74) 



Similarly, using 

^ I) =a (A^a. - I]A^^) < (a, - I) , ( 12 . 75 ) 

a ([A|’a,A,-^ - l] [A^^a^Ak - l]) 



Q?{Ak) Q_{Rk) 



and 



CkPkCl -h Rk 



.{cl[ 

the expression (12.72) becomes 



Ck] > 



{ak - I) 



a{Cl)a{Ck) 



<y {ock - 1) <72 = 



a{CkPkCl + Rk)' 

g(Afc) / S_{Cl)a{Ck) g(Pfc) 

a{Ak) \u{Cl)d{Ck)a{CkPkCl + Rk) 



(12.76) 



(12.77) 



(12.78) 
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Bearing in mind that is a diagonal matrix, the above inequalities can 
be expressed as 

max \ai^k\ < 7 i and max \ai^k ~ 1| < 72- (12.79) 

Since 

Pk = Ai,kP'kAlk + TkQk-iTl + HkRkHj, (12.80) 

it is clear that an appropriate selection of the instrumental matrices Q^-i 
and Rk may enlarge the bounds 71 and 72 , and, consequently, the domain of 
attraction. Indeed, if the conditions (12.79) are satisfied, then Xk converges 
to Xk- 



12.3.4. Increasing the convergence rate via genetic programming 

Unfortunately, the analytical derivation of the matrices Qk-i and Rk is 
an extremely difficult problem. However, it is possible to set the above ma- 
trices as follows: Qk-i = Rk — Pil, with /3i and /3i large enough. 
On the other hand, it is well known that the convergence rate of such an 
EKF-like approach can be increased by an appropriate selection of the co- 
variance matrices Qk~i and Rk, i.e., the more accurate (near true values) 
the covariance matrices, the better the convergence rate. This means that in 
the deterministic case {wk = 0 and Vk = 0 ), both matrices should be zero 
ones. Unfortunately, such an approach usually leads to the divergence of the 
observer and to other computational problems. To tackle this, a compromise 
between the convergence and the convergence rate should be established. This 
can be easily done by setting the instrumental matrices as 

Qj^_i = /3ieJ_i£k-iI Sil, Rk — /32^'k^kl + S2I, (12.81) 

with ^ 1 , ^2 large enough, and ^i, 62 small enough. 

Although this approach is very simple, it is possible to increase the con- 
vergence rate further. Indeed, the instrumental matrices can be set as follows: 

Qk-I = (f{^k-l)I Rk = f'‘^{^k)I ^2l-> (12.82) 

where q{£k-i) and r(e^) are non-linear functions of the output error £k 
(the squares are used to ensure the positive deffniteness of Qk~i sind Rk)- 
Thus, the problem reduces to identifying the above functions. To tackle it, 
genetic programming can be employed. The unknown functions q{£k-i) and 
r{£k) can be expressed as trees, as shown in Fig. 12.5. Thus, in the case of 
q{') and r(-), the terminal sets are T = {£a;-i} and T = respectively. 
In both cases, the function set can be defined as F = {-!-, *, /, ^i(-), . . . , 6(')}5 
where ^k{') is a non-linear univariate function and, consequently, the number 
of populations is rip = 2. Since the terminal and function sets are given, the 
approach described in the previous sections can be easily adopted for the 
identification purpose of g(-) and r(-). 
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Fig. 12.5. Exemplary tree representing r(ek) 



First, let us define the identification criterion, constituting a necessary 
ingredient of the Qk-i and Rk selection process. Since the instrumental 
matrices should be chosen so as to satisfy (12.79), the selection of Q^-i and 
Rk can be performed according to 

(Qfc_i,i2fc) = arg mp Johs,i{q{£k-i),r(ek)), (12.83) 

q{£k-i),r{£k) 

where 

nt—l 

Johs,i{q{£k-i),r{ek)) = ^ tracePfc. (12.84) 

k=0 

On the other hand, owing to FDI requirements, it is clear that the output 
error should be near zero in the fault-free mode. In this case, one can define 
another identification criterion: 



(Qk-i,Rk) - 3i.rg ^ min Johs,2{q{e:k-i),r{ek)) , (12.85) 

Q{£k-i),r{£k) 



where 



nt — l 



Jobs,2{q{£k-i),r{ek)) = ^ e^efc. 



( 12 . 86 ) 



k=0 



Therefore, in order to join (12.83) and (12.85), the following identification 
criterion is employed: 



(Qfc_i,Pfc) = arg min Johs,3{q{£k-i),r{ek)), 

Q{£k-i),r{£k) 



where 



Johs,3{q{£k-l),r{£k)) = 



■/obs,2 {q{^k-l),r{£k)) 



(12.87) 



( 12 . 88 ) 



J"obs,i {q{^k-i),r{£k)) ' 

Since the identification criterion is established, it is straightforward to use 
the GP algorithm detailed in Table 12.1. 
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12.3.5. EUlO'based sensor FDI 

In order to design a fault detection and isolation scheme for a real industrial 
system, it is insufficient to only design an observer and check that the norm 
of the residual (or the output error) has exceeded a prespecified maximum 
admissible value T (threshold). Indeed, this condition is necessary only for 
fault detection. For the purpose of fault isolation, it will be desirable to design 
a bank of observers where each of the observers is sensitive to one fault while 
insensitive to others. Unfortunately, such a requirement is rather difficult to 
attain. A more realistic approach is to design a bank of observers where each 
of the observers is sensitive to all faults but one. 

In this section, the faults are divided into two categories, i.e., sensor and 
actuator faults. First, the sensor fault detection scheme is described. In this 
case, the actuators are assumed to be fault free, and hence for each of the 
observers the system can be characterised as follows: 



Xk+i = g{xk) + h{uk) + Ekdk, 


(12.89a) 


yl+i = c{xk+i + /i+1, 


(12.89b) 


yj,k-^l ~ (^j,k^k-\-l fj^k-^-lf j — 1) • ■ • 5 ^5 


(12.89c) 



where, similarly to the way it was done in (Chen and Patton, 1999), Cj^k ^ 
is the j-th. row of the matrix Ck, e is obtained from the 

matrix Ck by deleting the j-th. row, cj^k, is the j-th element of 

yk-\-i^ ctnd Vk-^i 6 is obtained from the vector yk-\-i by deleting the 

j-th component yj,k-\-i- Thus, the problem reduces to designing m EUIOs 
(Fig. 12.6), where each of the observers is constructed using all input data 
sets and all output data sets but one: {l/jt, 

\\'^j,k\\ < (12.90a) 

/ l,...,j - l,j + l,...,m, 

IKfc||>4, (12.90b) 

where ef denotes a prespecified threshold. 

12.3.6. EUlO-based actuator FDI 

Similarly to the case of the sensor fault isolation scheme, in order to design an 
actuator fault isolation scheme, it is necessary to assume that all sensors are 
fault free. Moreover, the term h{uk) in (12.49a) should have the following 
structure: 



h{uk) = BkUk- 



(12.91) 
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Fig. 12.6. Sensor fault detection and isolation scheme 



In this case, for each of the observers the system can be characterised as 
follows: 

^k+l — dip^k) ^ "b f k) ~b ^ii^i,k ~b H" ^k^k^ 

= g{xk) + h\uk + fi)+Ei4, (12.92) 

Vk+i^^k^k+i, i = (12.93) 

where 

h'K + fi) = H , . ■ . , bi-\bi+\ . . . bi] [4 + fi ] , 

hi{Ui^k “b fi,k) — ^ki^i,k “b fi,k)i (12.94) 

Ei = [E,bi], 4= . (12.95) 

Thus, the problem reduces to designing r EUIOs (Fig. 12.7). When all 
actuators but the i-th one are fault free, and all sensors are fault free, then 
the residual r = — yj^ will satisfy the following isolation logic: 

Iki.fcll < ejj, (12.96a) 

/ = 1, . . . - l,i + 1, . . . ,r. 

Iln.fcll > (12.96b) 




Fig. 12.7. Actuator fault detection and isolation scheme 
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12.4. Experimental results 

12.4.1. System identification with GP 

The main objective of further investigations is to show the reliability and 
effectiveness of the system identification technique proposed in the present 
chapter. In particular, real data from an industrial plant were employed to 
identify both the input-output and state-space models of chosen parts of the 
plant. The models to be obtained are as follows (cf. Fig. 12.8): 

• The vapour model 

The input and output vectors: 

Uk - (T51_07), uj, = (P51_03). 



Table 12.2. Specification of process variables 



F51_01 

F51_02 

LC51_03 

P51_03 

P51_04 

P51_05 

P51_06 

T51_01 

T51_06 

T51_07 

T51_08 

TC51_05 
LC51_03CV 
LC51 03X 



Thin juice flow at the inlet of the evaporation station 
Steam flow at the inlet of the evaporation station 
Juice level in the first section of the evaporation station 
Vapour pressure in the first section of the evaporation station 
Juice pressure at the inlet of the evaporation station 
Juice pressure at the inlet of the valve 
Juice pressure at the outlet of the valve 
Juice temperature at the outlet of the valve 
Input steam temperature 

Vapour temperature in the first section of the evaporation station 
Juice temperature at the outlet of the first section of 
the evaporation station 

Thin juice temperature at the outlet of the heater 
Control value 

Servomotor rod discplacement 




Fig. 12.8. Scheme of the first stage of the evaporation station 
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• The apparatus model 

The input and output vectors: 

Uk = (T51_06, TC51_05, F51_01, F51_02), yj, = (r51_08). 



• The control valve model 

The input and output vectors: 

Uk = (LC51_03Cy,P51_05,P51_06,T51_01), 

Vk = (P51_01,PC'51_03X). 

The data used for the identification and validation of the vapour and 
apparatus models were collected from two different shifts. The data from the 
first shift were used for the identification, and the data from the second one 
formed the validation data set. It was not possible to manipulate the system 
at all. Thus, the data were collected under the normal production mode. It 
is, howevever, well understood that any change in the continous production 
mode may be very dangerous for system safety, and hence it may lead to 
serious economical losses. 

Unfortunately, the data turned out to be sampled too fast (the sampling 
rate selected by the system operator was 10s). Thus, every 10-th value was 
picked, after proper prefiltering (Ljung, 1988, Section 14.5), resulting in the 
riy — Tit — 700-th element identification and validation data sets. After this, 
the offset levels were removed with the use of the MATLAB Identification 
Toolbox. 

Contrary to the vapour and apparatus models, the valve model was further 
used for the purpose of fault diagnosis. In order to check the reliability and 
performance of the fault diagnosis system, it was necessary to generate faults. 
It is obvious that it is very hard or even impossible to generate faults with a 
real industrial plant. Thus, an actuator valve simulator was developed with 
the MATLAB Simulink. This tool makes it possible to generate data for a 
total of 19 faults as well as for the normal operating mode. The analysed 
faults are described in Table 12.3. The faults can be considered either as 
abrupt or incipient. A comprehensive description of all faults can be found 
on the (DAMADICS, 2002) website. 

12.4.1.1. Vapour model 

The objective of this section is to design the input-output vapour model 
according to the approach described in Section 12.2.3. The assumption re- 
garding the input-output structure of the model considered is caused by pre- 
sentation purposes, which means that this model structure is not necessarily 
well suited to tackle this identification problem. 
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Table 12.3. Set of faults considered for the benchmark (abrupt faults: 
S - small, M - medium, B - big, I - incipient faults) 



Fault 


Description 


S 


M 


B 


I 


h 


Valve clogging 


X 


X 


X 




/2 


Valve plug or valve seat sedimentation 






X 


X 


h 


Valve plug or valve seat erosion 








X 


u 


Increase in valve or busing friction 








X 


h 


External leakage 








X 


h 


Internal leakage (valve tightness) 








X 


fr 


Medium evaporation or critical flow 


X 


X 


X 


X 


fs 


Twisted servomotor’s piston rod 


X 


X 


X 




h 


Servomotors housing or terminals tightness 








X 


fio 


Servomotor’s diaphragm perforation 


X 


X 


X 




hi 


Servomotor’s spring fault 








X 


fl2 


Electro-pneumatic transducer fault 


X 


X 


X 




/l3 


Rod displacement sensor fault 


X 


X 


X 


X 


/l4 


Pressure sensor fault 


X 


X 


X 




/l5 


Positioner feedback fault 






X 




/16 


Positioner supply pressure drop 


X 


X 


X 




fl7 


Unexpected pressure change across the valve 






X 


X 


fis 


Fully or partly opened bypass valves 


X 


X 


X 


X 


fl9 


Flow rate sensor fault 


X 


X 


X 





The parameters used during the identification process were: Pcross == 0.8, 
-Pmut — 0.01, Ura — 200, Tid = 10, Tig = 10, F = {+ 5 * 5 /}- Moreover, for 
the sake of comparison, the ARX model was obtained. In both the ARX 
and non-linear input-output model cases the order of the model was tested 
between n,, = = 1, . . . , 4. 

The experimental results showed that the best-suited ARX model is of 
the order Uy = riu — 4. On the contrary, after 50 runs of the GP algorithm 
performed for each model order, it was found that the order of the model 
which provides the best approximation quality is riy = Uu = 2. The best 
model structure obtained is given by 

Vk = {{P2Uk-2 + Plh-2)ul_i + {p5Uk-2yk-l +Peul_2 + PspI-I 
+ P4yk-lUk-2 +p9)Uk-l +PrUk-2yl-l + Piyk-lUk- 2 ) / 

{pioPk-i +Pnyl-i+ Pi 2 yk-\Uk -2 + P 13 ) , (12.97) 
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where 



p = (-0.021,0.495,1.682,-0.832,0.601,0.877, 



- 1.396, 1.206, 1.931, -0.091, 0.067, -0.038, 0.495). (12.98) 

A comparative study performed for the ARX and GP models shows that the 
GP model is superior to the ARX model. Indeed, the mean-squared output 
error was 1.5 and 3.77 for the GP and ARX models, respectively. However, 
this superiority can be particularly clearly seen in the case of the validation 
data set, for which the mean-squared error was 6.7 and 21.5 for the GP 
and ARX models, respectively. From these results it can be seen that the 
introduction of the non-linear model has significantly improved modelling 
performance. While a linear model may be acceptable in the case of the 
identification data set, it is clear that its generalisation abilities are rather 
unsatisfactory, which was shown through the test on the validation data set. 
The response of the model obtained for both the identification and validation 
data sets is given in Fig. 12.9. 




Fig. 12.9. System (solid line) and model (dashed line) output 
for the identification (a) and validation (b) data sets 



The main drawback to the GP-based identification algorithm concerns its 
convergence abilities. Indeed, it is very difficult to establish convergence con- 
ditions which can guarantee the convergence of the proposed algorithm. On 
the other hand, many examples treated in the literature, (Esparcia- Alcazar, 
1998; Gray et aL, 1998; Koza, 1992), as well as the authors’ experience with 
GP (Witczak and Korbicz, 2000a; 2000b), confirm the particular usefulness 
of the algorithm, in spite of the lack of the convergence proof. In the case of 
the presented example, the average fitness (the mean-squared output error 
for the identification data set). Fig. 12.10, for the 50 runs of the algorithm 
confirms the modelling abilities of the approach. 
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Fig. 12.10. Average fitness for 50 runs of the algorithm 




Fig. 12.11. Histogram representing the fitness of 50 models 



Moreover, based on the fitness attained by each of the 50 models (result- 
ing from the 50 runs), it is possible to obtain a histogram representing the 
obtained fitness values (Fig. 12.11) as well as the fitness confidence region. 
Let a = 0.99 denote the confidence level. Then the corresponding confidence 
region can be defined as 



Jm ^ 



jm 






S - s 

5 Jm I" 






(12.99) 



where jm = 1.89 and s = 0.64 denote the mean and the standard de- 
viation of the fitness of the 50 models, while ta = 2.58 is the normal 
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distribution quantile. According to (12.99), the fitness confidence region is 
Jm ^ [1.65, 2.12], which means that the probability that the true mean fitness 
Jm belongs to this region is 99%. On the other hand, owing to the multi- 
modal properties of the identification index, it can be observed (Fig. 12.11) 
that there are two optima resulting in models of different quality. However, it 
should be pointed out that, on average (Fig. 12.10), the algorithm converges 
to the optimum resulting in models of better quality. The convergence abil- 
ities of the algorithm can be further increased by the application of various 
parameters, e.g., Pcross? and control strategies (Eiben et al, 1999), 

although this is beyond the scope of this chapter. 

The above results confirm that even if there is no convergence proof, the 
proposed approach can be successfully used to tackle the non-linear system 
identification problem. 

12.4.1.2. Apparatus model 

The objective of this section is to design a state-space apparatus model ac- 
cording to the approach described in Section 12.2.5. The parameters used in 
the GP algorithm are the same as in Section 12.4.1.1. Similarly, for the sake 
of comparison, a linear state-space model was obtained. In both the linear 
and non-linear cases the order of the model was tested between n == 2, . . . , 4. 

The experimental results showed that the best-suited linear model is of 
the order n = A. After 50 runs of the GP algorithm performed for each 
model order, it was found that the order of the model which provides the 
best approximation quality is n — 2. The best obtained model structure is 
given by 

xi^k+i = tanh {si^i)xi^k + hi{uk), (12.100a) 

= tanh (52, 2 )^ 2 , A: + h 2 {uk), (12.100b) 

where 

si,i 

hl('lXfc) — {^l,k “I” {'^l,k 4“ 2u4.^k 4” ^4,A:^l,A;) 

X i'^l,k 4- U4^k 4- Us^k 4- U4^ky'l^k)^Us^k 

+ U3,k (y^l^k 4- {Ui^k 4- U4^k + y'3,k 4" ^4,A;^1,A:) 



= -0.13x2,fc, 52,2 = 



X2,k 



Xl,k {x2,kXl^k 4 - 2^2 ^ + 1 ) ’ 



( 12 . 101 ) 



X 



'^l,k 4 - 



'^A,k'^3,k^l,k 
y'4,k 4 - U2,k 



4" ‘^u^^k 



( 12 . 102 ) 



h2{"^k) — '^l,k 4 " '^ 2,^5 



(12.103) 
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and C — [0.21 * 10“^, 0.51]. A comparative study performed for the linear 
model and the GP model shows that the GP model is superior to the linear 
state-space one. Indeed, the mean-squared output error was 0.05 and 0.2 for 
the GP and linear state-space models, respectively. This superiority was also 
confirmed in the case of the validation data set, for which the mean-squared 
error was 0.5 and 2.1 for the GP and linear state-space models, respectively. 

From these results it can be seen that the proposed non-linear state-space 
model identification approach can be effectively applied to various system 
identification tasks. The response of the model obtained for both the identifi- 
cation and validation data sets is given in Fig. 12.12. The average fitness (the 
mean-squared output error for the identification data set), Fig. 12.13, for the 
50 runs of the algorithm confirms the modelling abilities of the approach. As 





Fig. 12.12. System (solid line) and model (dashed line) output for 
the identification (a) and validation (b) data sets 




Fig. 12.13. Average fitness for 50 runs of the algorithm 
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Fig. 12.14. Histogram representing the fitness of 50 models 



previously, based on the fitness attained by each of the 50 models (result- 
ing from the 50 runs), it is possible to obtain a histogram representing the 
achieved fitness values (Fig. 12.14) as well as the fitness confidence region. 
According to (12.99), the fitness confidence region is Jm ^ [0.06,0.078] (for 
s — 0.02, jm = 0.07), which means that the probability that the true mean 
fitness Jm belongs to this region is 99%. Similarly as in the previous section, 
it can be observed (Fig. 12.14) that there are two optima in the models space. 
However, on average, the algorithm is convergent to the optimum resulting 
in models of better quality. 



12.4.1.3. Valve actuator model 

The objective of this section is to design the state-space model of the analysed 
actuator (cf. Fig. 12.8) according to the approach described in Section 12.2.5. 
The parameters used in the GP algorithm are the same as in Section 12.4.1.1. 
Similarly, for the sake of comparison, the linear state-space model was ob- 
tained with the use of the MATLAB System Identification Toolbox. In both 
the linear and non-linear cases the order of the model was tested between 
n = 2,..., 8. Unfortunately, the relation between the input Uk and the 
juice flow yi^k cannot be modelled by a linear state-space model. Indeed, 
the modelling error was approximately 35%, making thus the linear model 
unacceptable. On the other hand, the relation between the input Uk and 
the rod displacement 2 / 2 , A; can be modelled, with very good results, by the 
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linear state-space model. Bearing this in mind, the identification process was 
decomposed into two phases: 

(i) the derivation of the relation between the rod displacement and the 
input with a linear state-space model; 

(ii) the derivation of the relation between the juice fiow and the input with 
a non-linear state-space model designed by GP. 



The experimental results showed that the best-suited linear model of the 
rod displacement is of the order n — 2. After 50 runs of the GP algorithm 
performed for each model order, it was found that the order of the model 
which provides the best approximation quality is n = 2. Thus, as a result 
of combining both the linear and non-linear models, the following model 
structure of the actuator was obtained: 





^F{Xk) 


0 




h{uk) 


Xk-\-l — 


0 


Ax _ 


Xk + 


BxUk 



(12.104a) 



Vk+l ~ 



where 



Arixk) = diag 



O.Stanh 



+ 23xi^kX2,k + 



26^1, fc 

X2,k + 0.01 



(12.104b) 



0.15tanh 



5x1 1 . + 

«t + o.oi 



0.78786 -0.28319 
0.41252 -0.84448 ’ 



2.3695 -1.3587 -0.29929 1.1361 
12.269 -10.042 2.516 0.83162 ’ 



h{uk) = 



-lM7ul^ + 0.0629w| - 0.5019u^ - 3.0108 m| 
+0.9491(wi,fcW2,fc - Hi.ftWs.fc) - 0.5409^;^^^^^ + 0.9783 
-0.2921(2 ^ + 0.0162ul^ - 0.1289«2 ^ - 0.7733u| 



+0.2438(ui^ftU2,fc ~ ui^kUZyk) — 0.1389 



^l,fe^4,fe 

U2,kU3^k-\-0.01 



+ 0.2513 



110 0 
0 



C = 



0 0.79 -0.047 
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The mean-squared output error for the model (12.104a)-(12.104b) was 
0.0079. The response of the model obtained for the validation data set is giv- 
en in Fig. 12.15. The main differences between the behaviour of the model and 
the system can be observed (cf. Fig. 12.15) for the non-linear model (the juice 
flow) during the saturation of the system. This inaccuracy constitutes the 
main part of modelling uncertainty, but in practice this drawback is less im- 
portant because it is rather vain, owing to safety requirements, to expect that 
the control system will force the valve till its maximum allowable flow rate. 
According to (12.99), the fitness confidence region is Jm ^ [0.0093,0.0105] 
(for: s — 0.0016, jm = 0.0097), which means that the probability that the 
true mean fitness Jm belongs to this region is 99%. Contrary to the previous 
sections, it can be observed (Fig. 12.16) that there is only one optimum in 
the models space. 





Fig. 12.15. System (dotted) and model (solid) outputs (juice flow (a), 
rod displacement (b)) for the validation data set 




0.007 0.008 0.009 0.01 0.011 0.012 0.013 0.014 0.015 

Mean-squared error 



Fig. 12.16. Histogram representing the fitness of 50 models 
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12.4.1.4. State estimation and fault detection of an induction motor 



The purpose of this section is to show the reliability and effectiveness of 
the observer-based fault diagnosis scheme described in the present chapter. 
The numerical example considered here is a fifth-order two-phase non-linear 
model of an induction motor, which has already been the subject of a large 
number of various control design applications (Boutayeb and Aubry, 1999). 

The complete discrete time model in a stator-fixed (a, 6) reference frame is 



a;i,*+i = Xi^k + h(^-^xik + ^X3k+Kpx5kXik + 

X2,k+1 = X2,k+h(^-'fX2k+KpX5kX3k + ^Xik + ^^“2 
,{M 1 \ 

,fM 1 \ I 

^4,A;+1 — ^4, A; “b hy^^X2k P^bk^Sk "J^^Ak J ^2, kj 

^5,k+l ~ ^5,k “b (^X^k^2k ^Ak^lk) ~j ~ ^ ) 

yi,k-{-l — V2,k-{-l = ^2,k-\-l^ 



)+0.1di,fc, (12.106a) 
)+0.1d2,fc, (12.106b) 



(12.106c) 

(12.106d) 

(12.106e) 

(12.106f) 



where the Xk = (xi^k, ■ ■ ■ ,Xn,k) = (*sak,*sbk,^rak, V’rbk,Wfc) vector represents 
the currents, the rotor fluxes, and the angular speed, respectively, while Uk = 
(usak)^isbk) is the stator voltage control vector, p is the number of the pairs 
of poles, and Tl is the load torque. The rotor time constant Tr and the 
remaining parameters are deflned as 



Tr = 



Ij^ 




K = 



M 



aL.Lj ’ 



Rs RrM"^ 

aLs ^ aL.Ll ’ 



(12.107) 



where Rg, Rp and Ls, Lr are the stator and rotor per-phase resistances 
and inductances, respectively, and J is the rotor moment inertia. 

The numerical values of the above parameters are as follows: Rs = 
0.18 n, Rr = 0.15 0, M = 0.068 H, L* = 0.0699 H, Lp = 0.0699 H, 
J = 0.0586 kgm^, = 10 Nm, p = 1, and h = 0.1 ms. The initial con- 
ditions for the observer and the system are Xk — (200, 200, 50, 50, 300) and 
Xk = 0. The unknown input distribution matrix is 



0.01 

0 



0 10 0 
0.01 0 1 0 



-,T 



Ek = 



(12.108) 
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and hence, according to (12.36), the matrix Hk is 



Hk 



1 

0 



0 100 
1 0 



0 

100 



-,T 



0 

0 



The input signals are 

ui,k = 300cos(0.03A;), U 2 ,k = 300sin(0.03A:). 



The unknown input is defined as 



(12.109) 



( 12 . 110 ) 



j, = 0.09sin(0.57rA;) cos(0.37tA;), d 2 ,fc = 0.09sin(0.01fc), (12.111) 

and Po = 10^/. 

The following three cases concerning the selection of Qk-i and Rk were 
considered: 

Case 1: Classical approach (constant values), i.e., Qj._j = 0.1, and Rk — 0.1. 
Case 2; Selection according to (12.81), i.e., 

Qf._^ = lQ^el_^£k-iI+0.QlI, Pfc = 10£fefc7+0.01J. (12.112) 



Case 3: GP-based approach presented in Section 12.3.4. 

It should be pointed out that in all cases the unknown input-free mode 
(i.e., dk — 0) is considered. This is because the main purpose of this example 
is to show the importance of an appropriate selection of instrumental matrices 
but not the abilities of disturbance de-coupling. 

In order to obtain the matrices and Rk using the GP-based ap- 

proach [Case 3)^ a set of ut = 300 input-output measurements was generated 
according to (12.106a)-(12.106f), and then the approach from Section 12.3.4 
was applied. As a result, the following form of the instrumental matrices was 
obtained: 

Qk_i = {I0hlk_islk-i + 1012£i,fc_i + 103.45ei,;i_i + O.Ol)" I, (12.113) 

Rk = {n2elk + 0.1ei,kS2,k + 0.12)^ I. (12.114) 

The parameters used in the GP algorithm presented in Section 12.3.4 were 
rijn = 200, Ud = 10, Us = 10, F = {+,*,/}. It should also be pointed out 
that the above matrices (12.113)-(12.114) are formed by simple polynomials. 
This, however, may not be the case in other applications. 

The simulation results (for all cases) are shown in Fig. 12.17. The numeri- 
cal values of the optimisation index (12.88) are as follows: Jobs , 3 = 1.49* 10^ 
{Case i). Jobs , 3 = 1-55 {Case 2), and Jobs ,3 = 1-2* 10“^® {Case 3). Both 
the above results and the plots shown in Fig. 12.17 confirm the relevance of 
an appropriate selection of the instrumental matrices. Indeed, as can be seen, 
the proposed approach is superior to the classical technique of selecting the 
instrumental matrices Qk-i and Rk- 
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Fig. 12.17. State estimation error norm ||eA:||2 for Case 1 
(dash-doted line), Case 2 (dotted line) and Case 3 (solid line) 



The objective of presenting the next example is to show the abilities of dis- 
turbance decoupling. In this case, the unknown input dk acts on the system 
according to (12.111). All simulations were performed with the instrumental 
matrices set according to (12.133)-(12.134). Figure 12.18 shows the residual 
signal for an observer without unknown input decoupling, i.e., Hk = 0. In 
this figure it is clear that the unknown input infiuences the residual signal and 
hence it may cause unreliable fault detection (and, consequently, fault isola- 
tion). On the contrary. Fig. 12.19 shows the residual signal for an observer 
with unknown input decoupling, i.e., Hk was set according to (12.36). In this 
case, the residual is almost zero. This confirms the importance of disturbance 
decoupling. 





Fig. 12.18. Residuals for an observer without unknown input decoupling 
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Fig. 12.19. Residuals for an observer with unknown input decoupling 



The objective of presenting the next example is to show the effectiveness of 
the proposed observer as a residual generator in the presence of an unknown 
input. For that purpose, the following fault scenarios were considered: 

Case 1: an abrupt fault of the yi^k sensor: 



0, k< 140, 
otherwise. 



fs^k — 

Case 2: an abrupt fault of the u\^k actuator: 
fa^k ~ 



0, k< 140, 
0.2ui,jfc, otherwise. 



(12.115) 



(12.116) 




Fig. 12.20. Residuals for a sensor fault 
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Fig. 12.21. Residuals for an actuator fault 



In Figs. 12.20 and 12.21 it can be observed that the residual signal is sen- 
sitive to the faults under consideration. This, together with unknown input 
decoupling, implies that the process of fault detection becomes a relatively 
easy task. 



12.4.1.5. Sensor FDI with EUlO 



In this section the sensor fault diagnosis scheme proposed in Section 12.3.5 
is implemented and tested against simulated faults. Based on the above ap- 
proach, the matrices C\ and for m = 2 observers were defined as 

c\ = [l 0 0 0 0 ], C\ = [Q 1 0 0 0 ], (12.117) 

and this time the initial condition for both observers was selected as xq — 
1 . The matrices and Rk for the observers were obtained by simply 

modifying the equations (12.133)-(12.134), i.e., 

= (lO^el i 4- 1112efc_i + O.Ol)^ J, (12.118) 

Rk = (2134 + O.lei.fc + 0.12)^ I. (12.119) 

Although such a simple reduction works quite well in the proposed exam- 
ple, there may be cases for which it is necessary to design the instrumental 
matrices for each of the observers separately. The fault signals were simulated 
according to the formulae: 



-100, k = 100,..., 150, 
0, otherwise. 



( 12 . 120 ) 



f2,k — 



10 : 

0 , 



k = 200,..., 250: 
otherwise. 



( 12 . 121 ) 
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Since dk E , q <m = the unknown input distribution matrix takes 
the form Ek = then the matrix Hk obtained with (12.36) 

contributes to the fact that [I — CkHk] — 0 and [I — Ck-\-iHk+i] = 0. This 
leads to the following form of the state estimation error: 

ejfc+i = -Kk+iCkSk - Kk+\L2^kf k ~ Hk+\L2^k+if k+i^ (12.122) 

and, consequently, the residual is 



Vk+i — —Ck-^iKk+iVk- (12.123) 

This is the reason why UIOs cannot be applied to MISO and other systems 
for which [J - CkHk] = 0 and [I - Ck+iHk-\-i\ — 0. If, however, the 
effect of an unknown input is not considered, i.e., dk = 0, then UIOs can be 
successfully employed. This is the case in the present example. 

The simulation results are shown in Fig. 12.22, in which it can be 
seen that the residual signal is almost zero in the fault-free case and 
increases significantly when a fault occurs, thus making the process of fault 
detection a relatively easy task. Moreover, it should be pointed out that 
each of the observers is sensitive to the faults of one sensor only, while 
remaining insensitive to the other sensors’ faults. This possibility facilities 
the process of fault isolation. Indeed, as can be seen in Fig. 12.22, the 
sensor fault isolation problem is relatively easy to solve. The purpose of 
this section is to show the reliability and effectiveness of the observer-based 
fault diagnosis scheme described in Section 12.3. In particular, in order to 
achieve a robust residual generator, the problem of decoupling modelling 
uncertainty, treated as an unknown input, is considered. The method of 
selecting an appropriate threshold for the purpose of fault detection is 
discussed as well. 




Fig. 12.22. Residual signals for Observer 1 (left) and Observer 2 (right) 
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12.4.1.6. Unknown input estimation and design of instrumental matrices 

The determination of the unknown input distribution matrix Ek constitutes 
a very important part of the EUIO designing procedure. Indeed, without this 
kind of knowledge it is impossible to design the EUIO, and consequently, to 
achieve robust residual generation. 

In a general non-linear case the unknown input estimation problem can 
be viewed as an unconstrained optimisation task of the form 

dl = arg min el+^Sk+i. (12.124) 

Since £k+i = 2/^+1 ~ Vk-hi^ where 

Xk-\-i = H“ dk-) (12.125) 

2/jfc+i — C k-\-i^k-\-i') (12.126) 

and 

Xk+i ^ g{xk) + h{uk) + dk, (12.127) 

Vk-\-l ~ ^ k-\-l^k-\-l') (12.128) 

the optimisation task can be realised by solving the following equation: 

d 

^£fc+i£fc+i = 0, (12.129) 

odk 

which is equivalent to a set of linear equations: 

^k+l [Vk+1 k+lloi^k) h{uk)]\ . (12.130) 

It should be clearly pointed out that (12.130) can be solved expilicite only 
when rank(Cfc 4 -i) = n. This condition is usually very difficult to attain in 
practice, and hence an approximate solution has to be employed, which can 
be obtained as follows: 

dl = arg min \\Cl_^_lCk+ldk - Cj+iSk+i/kW- (12.131) 

Apart from extremely simple problems, the optimisation problem (12.131) 
can be solved by any non-linear programming technique, e.g., by the Broyden- 
Fletcher-Goldfarb-Shanno method (Walter and Pronzato, 1997). 

As can be seen from (12.124), the unknown input dk is chosen in such 
a way so as to minimise the output error (residual) instead of the state 
estimation error e^. This simplifies, or sometimes even facilitates, the esti- 
mation of the unknown input, as the output error can be directly obtained 
based on the output of the system and the model. Such a solution can be 
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fully justified by the fact that the main purpose of robust FDI is to make 
the residual robust to the unknown input while this is not necessary for the 
state estimation error. 

Knowing an estimate dk of an unknown input dk, k = and 

bearing in mind the condition vank{Ck-\-iEk) rank(£7^) (Chen and Patton, 
1999, p. 72, Lemma 3.1) it is straightforward, using the approach described 
by Chen and Patton, (1999), to obtain the disturbance distribution matrix 
Ek and, consequently, by (12.36), the unknown input decoupling matrix Hk> 

In the case of the valve actuator considered, the matrix Hk has the 
following form: 



0.2074 0 0 0 
0.3926 0 0 0 



(12.132) 



Since the matrix Hk is known, the instrumental matrices Qk-i and Rk 
can be obtained. In order to obtain them using the GP-based approach, a 
set of rit = 1000 input-output measurements was utilised, and then the 
approach from Section 12.3.4 was applied. As a result, the following form of 
the matrices was obtained: 

Qk-i = + 12ei,fc-i + 4.32ei,fc_i + 0.02)" J, (12.133) 

Rk = {ll2elk + 0.0l£i,fc£2,fc + 0.02)" /. (12.134) 

The parameters used in the GP algorithm presented in Section 12.3.4 
were rim = 300, rid = 10, rig = 10, and F = {+5*5/}- It should also be 
pointed out that the above matrices (12.133)-(12.134) are formed by simple 
polynomials. This, however, may not be the case in other applications, where 
more sophisticated structures may be required. Although it should be pointed 
out that the matrices (12.133)-(12.134) have a very similar form to that 
obtained for the example presented in (Witczak et al, 2002b). 

As a result of introducing unknown input decoupling as well as selecting 
an appropriate form of the instrumental matrices Qk-i and Rk for the 
EUIO, the mean-squared output error was reduced from 0.0079 to 0.0022. 
As can be seen in Figs. 12.15 and 12.23, all these efforts to achieve better 
modelling quality are profitable and lead to more reliable residual generation. 



12.4.1.7. Threshold determination and fault detection 

In practical situations the residual Vk (the output error or innovation in 
the stochastic case) cannot be at the zero level, even when no faults occur. 
Thus, many efforts have been made (Chen and Patton, 1999) to enhance the 
robustness of FDI at the decision-making stage, i.e., while making decisions 
based on the residual signal. 
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Fig. 12.23. System (dotted) output and EUIO (solid) output (juice flow 
- left, rod displacement - right) 



Indeed, apart from simulation examples concerning some fictitious or very 
simple industrial systems, it is rather vain to expect perfect robustness. In re- 
ality, residuals are affected by noise and modelling uncertainty, which cannot 
be decoupled using UIOs. These conditions justify the importance of selecting 
an appropriate threshold on which FDI is to be based. One possible solution 
is to use a fixed threshold: 

lkA:|| < Fault-free mode, (12.135a) 

II^AjII > Faulty mode, (12.135b) 

where denotes a prespecified threshold. Thus, any fault which produces 
a residual smaller than is not detectable. When the fixed threshold 
is chosen, sensitivity to faults can be intolerably reduced if the threshold is 
too high, whereas the false alarm rate is to be increased when the threshold 
is too low. 

In the stochastic case, the problem of setting a fixed threshold can be 
tackled, e.g., by generalised likelihood ratio testing and sequential probability 
ration testing (Willsky and Jones, 1976; Basseville and Nikiforov, 1993). The 
main drawback to such techniques is the assumption that the process and 
measurement noise is Gaussian, and hence the residual is Gaussian as well. 
However, in practice, e.g., due to a lack of the possiblity of perfect decoupling, 
it is rather vain to expect that such an assumption will be satisfied. 

Another solution, which seems more suitable, consists in the use of the 
so-called adaptive threshold, i.e., a threshold which changes in time according 
to some prespecified rules. For a single residual considered as a scalar variable 
such an adaptive threshold can be defined as follows: 

^max 
^min 

L ^k 



= T{uk-i) 



(12.136) 
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where and are the bounds of the residual, which can be perceived 
as a residual confidence region. Thus, the problem is to design the relation 
T(-). Although many solutions have been proposed in the literature (Chen 
and Patton, 1999), most of them can only be applied to linear systems. 

In this work, a very simple approach to designing T(-) for both linear 
and non-linear systems is proposed. It is assumed that T(-) can be approxi- 
mated by the finite order polynomial, which for a single-input system can be 
described as follows: 



72r — 1 

T{uk-i) = Piu\_-^+po, (12.137) 

where pi — denotes the confidence region of the z-th parameter. 

These confidence regions can be easily obtained based on the least-square esti- 
mate of parameters and the corresponding Fisher information matrix (Walter 
and Pronzato, 1997). 

For the analysed actuator, a 99% parameter confidence region was as- 
sumed. Moreover, it was found that only the first input u\^k (control value) 
was significant for the residual. Thus for both residuals (the juice fiow and 
rod displacement) single- variable polynomials of the order 5 and 3, respec- 
tively, were employed. Fig. 12.24 presents the residuals and their bounds (the 
adaptive threshold) for the fault-free mode. Since the method of designing 
an appropriate threshold is known, it is possible to check the fault detection 
capabilities of the proposed observer-based fault detection scheme. For that 
purpose, data sets with faults were generated. It should be pointed out that 
only abrupt faults were considered (cf. Tab 12.3). Table 12.4 shows the re- 
sults of fault detection and provides a comparative study between the results 
achieved by the proposed fault detection scheme and the results provided by 
Supavatanakul, (2002), concerning a qualitative approach to fault detection. 




Fig. 12.24. Fault-free residuals and their bounds (juice flow - left, 
rod displacement - right) 




12. Observers and genetic programming in the identification and fault diagnosis. . . 503 



Table 12.4. Results of fault detection (D - detectable, 
N - not detectable, PD - possible but hard to detect) 



Fault 


Description 


S 


M 


B 




Valve clogging 


D 


D(D) 


D(D) 


h 


Valve plug or valve seat sedimentation 






D(D) 


fr 


Medium evaporation or critical flow 


D 


D(PD) 


D(D) 


fs 


Twisted servomotor’s piston rod 


N 


N(N) 


N(N) 


fio 


Servomotor’s diaphragm perforation 


D 


D(PD) 


D(D) 


fn 


Servomotor’s spring fault 






D(PD) 


/l2 


Electro-pneumatic transducer fault 


N 


N(PD) 


D(PD) 


fl3 


Rod displacement sensor fault 


D 


D(D) 


D(D) 


/l5 


Positioner feedback fault 






D(N) 


/l6 


Positioner supply pressure drop 


N 


N(PD) 


D(D) 


fl7 


Unexpected pressure change across the valve 






D(PD) 


fis 


Fully or partly opened bypass valves 


D 


D(D) 


D(D) 


/l9 


Flow rate sensor fault 


D 


D(PD) 


D(D) 



The notation given in Table 12.4 can be explained as follows: D means that 
we are 100% sure that a fault occurred, N means that it is impossible to de- 
tect a given fault, PD means that the fault detection system does not provide 
enough information for us to be 100% sure that a fault occurred. 

From Table 12.4 it can be seen that it is impossible to detect the fault 
fs. Indeed, the effect of this fault is exactly at the same level as the effect of 
noise. The residual is the same as that for the fault-free case (cf. Fig. 12.24), 
and hence it is impossible to detect this fault. There are also some problems 
with several small and/or medium faults. However, it should be pointed out 
that all faults (except for fs) which are considered big can be detected. 
Undoubtedly, further improvement in fault detectability can be achieved by 
introducing more sophisticated decision methods. 



12.5. Summary 

One purpose of this paper was to propose a unified framework for the iden- 
tification of non-linear dynamic systems. To tackle this problem, a relatively 
new genetic programming technique was employed. The main advantage of 
the proposed identifcation technique is that it allows generating the set of 
possible model structures in an automatic way. Indeed, contrary to other 
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well-know approaches (e.g., neural networks or neuro-fuzzy networks), the 
designer is not left with the time-consuming trial-and-error procedure. The 
only thing which has to be specified by the designer is the set of mathematical 
operators and functions underlying the model structure. Another advantage 
is that in spite of the fact that the models resulting from this approach are of 
the behavioural type, they are more transparent than the most popular rival 
structures, which is what neural networks undoubtedly are. This makes it 
possible, after a suitable analysis, to employ the models to the detection and 
isolation of system faults. This means that their transparency may allow find- 
ing some internal connections between the system and the model. It should 
also be pointed out that the state-space models resulting from this approach 
are asymptotically stable, which is a priori guaranteed by a suitable model 
structure. 

The main drawback to the proposed system identification technique is 
that it is relatively time consuming. This is caused mainly by the fact that 
for each of the models in each generation it is necessary to perform parameter 
estimation, which is relatively time consuming for models non-linear in their 
parameters. In addition to that, computational requirements grow together 
with the model order n (for state-space models) or the dimension of the 
output vector m (for input-output models). Even, as in this case, if the 
identification process is performed off line, it is advantageous to decrease 
the required computational burden. It can be achieved by developing the 
adaptation rules of crossover and mutation probabilities, which is to be the 
subject of further investigations. 

Another purpose was to propose a new observer which can be employed 
for fault diagnosis. Observers are popular as residual generators for fault de- 
tection (and, consequently, for fault isolation) of both linear and non-linear 
dynamic systems. Their popularity lies in the fact that they can be also em- 
ployed for control purposes. There are, of course, many different observers 
which can be applied to non-linear and, especially, non-linear deterministic 
systems. Logically, the number of real world applications (not only simulated 
examples) should proliferate, yet this is not the case. It seems that there are 
two main reasons why strong formal methods are not accepted in engineer- 
ing practice. First, the design complexity of most observers for non-linear 
systems does not encourage engineers to apply them in practice. Second, the 
application of observers is limited by the need for non-linear state-space mod- 
els of the system considered, which is usually a serious problem in complex 
industrial systems. This explains why most of the examples considered in the 
literature are devoted to simulated or laboratory systems, e.g., the celebrated 
three- (two- or even four-) tank system, an inverted pendulum, a traveling 
crane, etc. It should be pointed out that this need for state-space models was 
the main reason why the genetic programming-based identification framework 
was developed. 

To tackle the observer designing problem, the concept of the extended 
unknown input observer was introduced. Moreover, it was shown that an 
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appropriate selection of the instrumental matrices strongly 

influences the convergence properties of the observer. To tackle the instru- 
mental matrices selection problem, a genetic programming-based approach 
was proposed. However, it should be pointed out that the proposed method 
does not provide a general solution to all problems. Indeed, it makes it possi- 
ble to obtain a particular form of the instrumental matrices Qj^^i and Rk, 
which is only appropriate for the analysed example. This is, in fact, the main 
drawback to the proposed approach. Another drawback is that the presented 
observer cannot be applied as a residual generator for MISO systems and all 
systems for which [/ - CkHk] = 0 and [I — C k-\-i H k-\-i] = 0 . 

Another problem arises from the fact that most fault diagnosis systems 
suffer from both modelling uncertainty and noise. This means that the pro- 
posed observer should be modified in such a way so as to be applied to 
non-linear stochastic systems. This, however, is beyond the scope of the 
present work, and determines further research directions (Witczak and Kor- 
bicz, 2002a). 

The experimental results, covering the model construction of chosen parts 
of an evaporation station at the Lublin Sugar Factory in Poland, confirm the 
reliability and effectiveness of the proposed identification framework. The 
availability of mathematical models resulting from the proposed approach 
and, especially, the availability of state-space models make it possible to deep- 
en the knowledge regarding system behaviour. Moreover, the application of 
state-space models together with the EUIO allows designing an efficient fault 
diagnosis system. This is especially important for the evaporation station be- 
cause the sugar campaign lasts, in fact, three months a year, and hence a 
breakdown is equivalent to serious economical losses. 

It was shown, using an example with the model of an induction motor, 
that the proposed observer can be a useful tool for both the state estimation 
and fault diagnosis problems of non-linear deterministic systems. 

Another example provided a detailed description of an industrial appli- 
cation study of the proposed fault diagnosis scheme. Starting from a set of 
measurements, it was shown how to employ GP to obtain a state-space model 
of the analysed actuator. The design steps of an EUIO were described in de- 
tail as well. In particular, a practical approach for determining a disturbance 
decoupling matrix was proposed and successfully applied providing robust 
residual generation for fault detection and isolation. The necessity of pro- 
viding an appropriate threshold for decision-making purposes was discussed 
and, as a result, a relatively simple design procedure was proposed. It was 
shown, using a set of faults, that the proposed observer-based fault detection 
scheme can provide good results. Indeed, in the discussed set of 12 faults only 
one was impossible to detect. 
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Notation 

t 

k 

Si-) 

Xk,Xk (x{t),x{t)) e E” 
VkiVk e 
Bk € E" 

6k e E"* 

Uk G E’’ 
dk G E^ 

Wk, Vk 

Qki 

P 

er 

ai-), h{-) 

Ek G E"^« 

Li^k, L2,k 

^1,2/5 • • • 5 ^1,W5 • • • 5 

5 



Up 

rid 

ris 

T, F 



time 

discrete time 
expectation operator 
state vector and its estimate 
output vector and its estimate 
state estimation error 
output error (residual) 
input vector 

unknown input vector, q < m 
process and measurement noise 
covariance matrices of Wk and Vk 
parameter vector 
fault vector 
non-linear functions 
unknown input distribution matrix 
fault distribution matrices 
maximum lags in outputs and inputs 
number of input-output measurements 
for identification and validation 
number of populations 
population size 
initial depth of trees 
tournament population size 
terminal and function sets 
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Chapter 13 



GENETIC ALGORITHMS 
IN THE MULTI-OBJECTIVE OPTIMISATION 
OF FAULT DETECTION OBSERVERS 



Zdzislaw KOWALCZUK*, Tomasz BIAtASZEWSKI* 



13.1. Introduction 

In engineering research and design, evolutionary algorithms have found an 
increasing number of applications (Chen et al, 1996; Fogarty and Bull, 1995; 
Grefenstette, 1985; Huang and Wang, 1997; Kirstinsson, 1992; Koziel and 
Kordalski, 1996; Kowalczuk et a/., 1999a; Li et al, 1997; Linkens and Ny- 
ongensa, 1995; Man et al, 1997; Martinez et al, 1996; Park and Kandel, 
1994; Sannomiya and Tatemura, 1996; Tanaka et al, 1996; Tang et al, 1996). 
The significance of such optimisation methods, which emulate the evolution 
of biological systems, is proven by their great usefulness and effectiveness. 
The well-known features of biological systems are their ability to re-generate, 
perform self-control and re-product as well as to adapt to the changeable 
conditions of existence. In a similar way, we also require that the designed 
technical systems be characterised by analogous features within the scope 
of adaptation, optimality and immunity. In particular, we can easily for- 
mulate tasks concerning the optimality of solutions and their robustness to 
small changes in environmental parameters and to disturbances that lead to 
more effective and reliable engineering systems. Moreover, in many practical 
decision-making processes it is essential to totally optimise several objective 
functions, and we often have to determine the relations between the partial 
objectives considered in order to integrate those objectives into one. 
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In view of the above, designing optimal systems can be associated with 
evolutionary mechanisms existing in nature, which allow eliminating detri- 
mental features as well as inheriting and developing (expanding) desirable 
features in the course of genetic transformations (iterations) . The elaborated 
evolutionary algorithms simulate the natural laws connected, in particular, 
with inheritance, crossover and mutation. They make very efficient tools for 
gaining optimal solutions in terms of effectiveness, cost, etc. Important fac- 
tors for the use of evolutionary algorithms in optimisation can be found in a 
universal realisation of the iterative procedures of stochastic search. 

The main goal of this chapter is to present a possible application of 
the genetic approach to multi-objective and multi-dimensional optimisation 
problems with the use of the notion of Pareto-optimality. The chapter is com- 
posed of six sections. First, a multi-objective optimisation task is defined and 
possible methods of solving this problem are discussed. An approach based 
on optimality in the Pareto sense and the use of a global optimality index, 
which facilitates the final assessment of solutions, is presented. Next, genetic 
algorithms are explained, along with the genetic terminology and symbols 
applied. The basic procedures of Genetic Algorithms (GAs), including the 
technique of niching, are also described. Finally, the methodology of con- 
structing linear state observers as detection systems is presented. To illustrate 
the applicability of the proposed approach, we shall consider two designing 
issues of detection observers, which serve as the principal element of the pro- 
cedures of detecting faults, which may occur in exemplary control systems 
of a remotely piloted aircraft and a ship propulsion system. Robust optimal 
detection observers obtained by genetic means demonstrate the potential ef- 
fectiveness of the proposed optimisation method, which, in accordance with 
assumed requirements, allows designing systems, having sufficient sensitivity 
to errors in sensors and actuators, while simultaneously showing robustness 
to modelling uncertainties. 



13.2. Multi-objective optimisation 

There are many forms of life that have been created as a result of natural 
evolution. On the basis of the existing variety of life, we infer that each of the 
species is optimal with respect to a certain subset of the ‘survival’ criteria. 

An apparent analogy can also be found within the products of human 
activity. One does not create only one kind of buildings, bridges or other fa- 
cilities. Thus we deal with various goods and their variants. Considering the 
automotive industry, for example, we immediately discern that each automo- 
tive vehicle is optimal with respect to a specified set of technical criteria. The 
purchaser of an automobile of a fixed class has to make a choice from amongst 
all equivalent optimal solutions (a trade-off between the price, reliability and 
safety). One may thus state that, in general, permanent consideration of 
equivalently optimal solutions and making a final decision are part of human 




13. Genetic algorithms in the multi-objective optimisation of fault. . . 



513 



nature. Unfortunately, based on one (non- weighted) set of criteria one can 
only select a set of solutions which are merely ‘mutually non-inferior’. 

In engineering decision processes the total optimisation of several objec- 
tive functions is even more essential. Such types of designing problems are 
called multi- objective optimisation tasks (Goldberg, 1989; Michalewicz, 1996; 
Trebi-Ollennu and White, 1997; Viennet et al, 1996). The ways of solving 
these problems can be founded either on the basis of an integrated criterion 
(by weighting the partial criteria) or by taking into consideration another 
independent assessment, performed according to other criteria. In the follow- 
ing material, a multi-objective optimisation task is introduced, and suitable 
solving methods are described in detail. 



13.2.1. Formulation of multi-objective optimisation problems 



Consider the following n-dimensional vector x of the parameters searched 
for: 



X = 



Xi X2 



n G N, 



(13.1) 



which is assessed according to an m-dimensional f{x) vector of criteria 
(objective functions): 



f{^) = [ fl{x) f2{x) 



fm{x) 




m G N. 



(13.2) 



Assuming that all coordinates of the criterion vector (13.2) represent profit 
functions, the multi-objective optimisation task analysed can be formulated 
as follows: 



max /(a?). 

X 



(13.3) 



At this stage, the problem (13.3) describes a multi-profit maximisation task 
without constraints. 



13.2.2. Multi-objective optimisation methods 

The methods of solving multi-objective optimality problems can be divided 
into two groups: classical (Michalewicz, 1996; Zakian and Al-naib, 1973; Chen 
et al, 1996) and ranking methods (scheduling) (Goldberg, 1989; Michalewicz, 
1996; Man et a/., 1997; Kowalczuk and Bialaszewski, 1999; 2000). 

Within the classical multi-objective optimisation methods we distinguish 
the methods of: (1) weighted profits (Michalewicz, 1996), (2) the distance 
function (Michalewicz, 1996), and (3) sequential inequalities (Zakian and 
Al-naib, 1973; Chen et al, 1996), whereas the ranking methods include: 
(1) Pareto-optimality ranking (Goldberg, 1989; Michalewicz, 1996; Man et 
al.^ 1997) and (2) ordering with respect to a global optimality level/index 
(where the vector profit function f{x) reduces to solely one scalar profit 
function being maximised (Kowalczuk and Bialaszewski, 1999; 2000)). 
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A, Classical methods 

The essence of the first two classical methods (Michalewicz, 1996) is the in- 
tegration of many objectives into one. As a result, the multi-objective max- 
imisation problem of g{x), can be given by 

maxf(x) maxg(x), (13.4) 

X X 

where A is a suitable operator A : E. 

In the method of weighted profits all coordinates of the profit vector 
f{x) are combined into one profit function g{x) by means of the following 
transformation A: 

g{x) = wf{x), (13.5) 

where w = [wi W 2 • • . Wm ] ^ denotes a normalised vector of weights 
such that its coordinates fulfil the conditions rcj G [0, 1] , i = 1, 2, . . . , m and 

E m 1 

i=i wi = 1 . 

The distance function method consists in integrating the coordinates of 
the profit function vector f{x) into a scalar profit function h{x) according 
to the following mapping A\ 

h{x) = \\f{x) -y\\^, (13.6) 

where y — [ y^ ... ym ^ denotes the so-called demand vec- 
tor, while a represents a suitable vector norm - usually the Euclidian one, 
described by a = 2. 

The method of sequential inequalities (Zakian and Al-naib, 1973; Chen 
et a/., 1996) is based on the transformation of a multi-profit maximisation 
task without constraints into a set of tasks of the maximisation of scalar 
profit functions with inequality constraints. Thus this problem boils down to 
searching for a parameter vector x which does not violate the inequality set 

fi{x)>Mi, z = l,2,...,m, (13.7) 

where fi{x) denotes a particular coordinate of the profit function vector, 
while Mi represents a limit value for the i-th profit function such that 

/* (Xi) > Mi > min{/i (ic*)} , = 1, 2, . . . , m, (13.8) 

where /* (x*) is the maximal value of the i-th profit function fi{x) obtained 
for a certain parameter vector x* from a scanned set of solutions. Similarly, 
x’j is a parameter vector denoting a maximum of the j-th profit function. 
The applied value of Mi determines the weight of the respective criterion in 
the analysed multi-profit maximisation problem. With the fulfilment of the 
inequality constrains (13.8), the more important the z-th profit function, the 
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smaller distance |/* — Mi\ should be chosen. A detailed description of this 
method, called a moving boundaries algorithm, can be found in (Zakian and 
Al-naib, 1973; Chen et al, 1996). 

The above, classical methods are simple. They have, however, the dis- 
advantage consisting in an arbitrary choice of the weighting vector, demand 
vector or limit values for the profit functions. In effect, the obtained solutions 
are correct only for the weights, demand or limits applied, meaning a simpli- 
fication of the multi-profit maximisation problem. Moreover, for the purpose 
of integrating all objectives into one within the weighted and distance meth- 
ods, the designer has to determine the relations among the objectives, which 
is not always possible. 

B. Ranking methods 

In contrast to the above, in ranking methods we avoid the arbitrary weighting 
of the objectives. Instead, a useful classification of the solutions is applied that 
takes into account particular objectives more effectively. Their main represen- 
tatives are ranks relating to Pareto- optimality (Goldberg, 1989; Michalewicz, 
1996; Man et al, 1997), which allows assessing multi-profit maximisation 
solutions as dominated or non- dominated {Pareto- optimal or, in short, P- 
optimal). The condition of Pareto optimality for a maximisation task on the 
vector profit function f{x) can be formulated as follows (Goldberg, 1989): 

Let us consider two solutions that are characterised by the corresponding 
vectors of profit functions f{xr),f{xs) G The vector f{xr) is partially 
smaller than the vector f{Xs) if and only if for all their coordinates the 
following condition is fulfilled: 

V fi{Xr) < fi{Xs) A 3 fi{Xr) < fi{Xs). (13.9) 

Thus, in the Pareto sense, a solution Xr is dominated if there exists a 
solution Xs whose vector of profit functions f{xs) is partially better than 
f{xr) in terms of the definition (13.9). A non-dominated solution Xg is 
called Pareto- optimal (P-optimal). 

Example 13.1. Let us consider an assessment of the set of solutions 
{x\,X 2 ^x^,X 4 ,, X 5 , cce, aJ?, ccs} in the Pareto sense for a two-dimensional vec- 
tor of criteria f{x) — [ fi{x) f 2 {x) as shown in Fig. 13.1. By applying 
the Pareto- optimal criteria, only the solutions {xi,Xq,X7,Xs} are shown to 
be P-optimal, because they dominate over the corresponding subset of the re- 
maining solutions. The P-optimal solutions situated in the dark areas are 
mutually equivalent. It results from the figure that the solution X 2 of the sec- 
ondary Pareto front is dominated by one solution X\ of the primary P- front, 
while the solution x^ is dominated by Xq. The solution X 4 is dominated by 
two solutions, Xq and x^ , whereas Xs is dominated by four solutions, x±, 
X 2 , x^ and Xq. The solutions xr and xs, which are neither dominated nor 
non-dominated, are isolated cases. 
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Fig. 13.1. Exemplary domination in the Pareto sense for two objectives 

Not only does the assessment of solutions concerning their Pareto- 
optimality determine the P-optimal set of solutions, but it also allows some 
ranking of all possible solutions with respect to the degree of domination. 
Namely, each solution is assigned a certain scalar quantity called a rank (Man 
et a/., 1997). This rank directly relates to the number of individuals in the 
current population by which the analysed individual is dominated in the 
sense of Pareto. Therefore, the rank p{xi) of a given solution Xi amongst 
N possible solutions is calculated according to the following formula: 

p{^i) — /^max p{Xi) + 1, (13.10) 

A^max = . max ^u{xi), (13.11) 

z=l,2,...,iV 

where p{xi) is the degree of domination, i.e., the number of solutions by 
which Xi is dominated in the same population, while jUmax is the maximum 
value from amongst all p{xi). This kind of ranking transforms the vector of 
profit functions into the scalar (one-dimensional) space. 

Example 13.2. As can be seen from Fig. 13.1, the degrees of domination of 
the P-optimal solutions are p{xi) = p>{xq) — pi{x'j) == p{x%) — 0, because 
no solution dominates over them. The remaining solutions have the following 
degrees of domination: p{x 2 ) = p{x^) = 1, p{x/^) — 2, /i(xs) — 4. 

Hence the maximal degree of domination amounts to 

A^max = g = 4. (13.12) 

Finally, the following ranks of the analysed solutions result from (13.11): 
p[xi) = p{xq) = p{xj) = p{x^) = 5, p{x 2 ) = p{x^) = 4, p{x4) = 3, 
p(xs) = 1. 
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The P-optimal solutions having the maximal rank, consti- 

tute the so-called primary (highest) Pareto- front. The solutions {x 2 ^x^} con- 
stitute the secondary Pareto-front. The solution {^ 4 } represents the third 
Pareto- front, while {^ 3 }^ having the minimal rank from among all the anal- 
ysed solutions, constitutes the fourth Pareto front (or the fifth w.r.t. the rank 
values). 

In the ranking method with respect to Pareto-optimality we effectively 
transform the profit vector into a scalar value. The concept of optimality 
applied does not, however, give any directions as to the choice of a single 
solution from amongst the Pareto-optimal solutions found. Therefore, it is 
the designer who has to make an independent judgement of the obtained 
offers. 

In order to utilise that freedom, a development of the ranking method 
was proposed (Kowalczuk and Bialaszewski, 1999; 2000) that uses the idea 
of a global optimality level. In particular, the vector profit function of each 
solution computed in the course of the ranking procedure is transformed into 
a scalar global optimality level, which allows useful ordering of the obtained 
solutions. Certainly, there is a chance of obtaining equal indices of global 
optimality for several solutions constraining the opportunity of obtaining 
the ideal serial ordering of solutions without additional interference by the 
designer. Nevertheless, this approach effectively limits the number of the most 
desired P-optimal solutions. 

The method of estimating the global optimality level is given by the 
following procedure: 



Procedure 13.1. 

1 ) seek for a maximal value of each profit function fi^^^ from amongst all 
of the N solutions (or only the P-optimal ones) 



V 

z=l,2,. 



max ifi(xi)}, 



(13.13) 



2 ) for each rj (starting with 77 = 1 and a step A = —0.05^, find the j-th 
solution Xj which fulfils the following condition: 

V /i(^j) ^ ^ /imax5 (13.14) 



and assign the global optimality level rjj = rj. 

The method of ordering with respect to the global optimality level per- 
mits a significant minimisation of the problem of the ambiguity of P-solutions, 
which is quite ‘painful’ for the designer using the Pareto-optimality criteria. 



Example 13.3. Let us apply Procedure 13.1 to the set of the eight solutions 
from Examples 13.1 and 13.2, depicted in Fig. 13.1. The maximal values of 
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the coordinates of the profit vector can he represented by the following vector: 

/max = [ fl{xs) h{Xl) = [ 10 8 ]^ ■ 

In the second step of Procedure 13.1 we obtain the following ordering of the 
solutions according to the global optimality level rjj{j = 1,2, ... ,8); 

xq ^ t]q — 0.60, X2 ^ r]2 = 0.20, 

X^ ^ = 0.50, CC7 -> 7/7 = 0.10, 

xi ^ rji = 0.30, xs,xs ^ rjs = t]8 = 0. 

X4 rj4 = 0.25, 



The solution Xq from the primary Pareto-front has a maximal value of the 
global optimality level equal to 0.6. The worst solutions Xs (from the last 
P- front) and x^ (from the primary Pareto- front) have the global optimality 
level of the null value. The presented ordering method with respect to the global 
optimality level clearly presents the solutions Xq and x^ as most optimal. 
What is important, ‘troublesome ’ solutions which maximise only some criteria 
or one of them ( even though they belong to the primary Pareto-front as Xj 
and Xs) are immediately ‘ruled out’ by virtue of the global optimality level rj 
applied. 

The ordering method according to the global optimality level can be also 
used for the final assessment of a set of only P-optimal solutions. 



Example 13.4. Once we have applied the above method of ordering the 
solutions from the primary Pareto-front {x\,xq,x^,X 8} , as results from 
Fig. 13.1, we obtain the following order: 



Xq ^ T]e = 0.6, Xr t ]7 = 0.1, 

Xi ^T]i = 0.3, Xs ^T]8= 0. 

The highest optimality level is assigned to solely one solution (xq —^tjq = 
0.60j. The application of the ranking and ordering methods at the same time 
facilitates a useful determination of one P-optimal solution with the highest 
global optimality level. In the analysed simple example of Fig. 13.1 the solution 
Xq may be intuitively recognised as the best one, though in higher dimensional 
spaces of optimised objectives this need not be so clear. 

By virtue of the above simple example, it appears that the examined 
methods of ranking and ordering are both universal and useful. They can be 
used in genetic optimisation processes, as discussed in the following sections 
of this chapter. 
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13.3. Genetic algorithms 

Evolutionary computation by means of genetic algorithms (Holland, 1975; De 
Jong, 1975; Fontiex et a/., 1995; Goldberg, 1986; 1989; 1990; Kowalczuk and 
Bialaszewski, 2000; Michalewicz, 1996; Renders and Flasse, 1996; Riberiro et 
al, 1994; Schoenauer and Michalewicz, 1997; Shi, 1997; Srinivas and Patnaik, 
1994; Suzuki, 1995) constitutes the best-known and most effective stochastic 
optimisation approach. Genetic algorithms are principally characterised by: 
processing the encoded forms of the values of the parameters, a simultaneous 
search in several regions of the parameter space, and stochastic rules of ge- 
netic expansion. This approach is universal, useful and easily implement able. 
Moreover, it is also free from limitations, which are often imposed on the 
searched space (e.g., its continuity) and on the objective function (e.g., the 
existence of derivatives and unimodality). Principally, its effectiveness can 
be associated with the numerical simplicity of calculating the objective func- 
tions. 

Genetic algorithms can be functionally characterised by using the origi- 
nal vocabulary borrowed from medical genetics (Berg and Singer, 1997). The 
sought solution of multidimensional optimisation tasks expressed in the form 
of a set of parameters is called an individual Thus each individual has a 
structure, which is described by a genotype including a constant number of 
chromosomes. Genetic information suitably encoded in chromosomes is sub- 
mitted for genetic (evolutionary) procedures. Moreover, each individual can 
be interpreted in terms of a phenotype^ resulting from the process of decod- 
ing the genotype into an ‘applicability realm’. In particular, the degree of 
fitness of each individual to its environment represented by the vector of 
objective functions is calculated based on the phenotype. The genotype of 
an individual is usually composed of one or two chromosomes. An individual 
having a pair of chromosomes is referred to as a diploidal individual, while 
an individual with one chromosome is called haploidal (and is identified with 
this chromosome). Genetic algorithms work on a set of individuals referred 
to as a population. A general structure of such a population, composed of N 
individuals of the n-ploidal type, is presented in Fig. 13.2. 

The parameters of the analysed optimisation task are thus genetically 
represented in chromosomes by a sequence of genes, which are quantities 
(symbols) encoding the value of these parameters. Figure 13.3 depicts the 
structure of a chromosome. The gene is the basic element of the chromosome. 
It has a definite position (locus) in the sequence of the chromosome and 
possesses a specific value called variety (allele). The alleles are represented 
in Fig. 13.3 by different colours. 

The population of individuals in GAs is subject to a simulated evolu- 
tion, meaning that in each cycle of the algorithm good solutions reproduce 
(i.e., transmit their genetic material into new generations), while bad indi- 
viduals die out. The population obtained after one GA cycle is called a new 
generation. The evaluation of generations is carried out according to an ob- 
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Fig. 13.3. Model of the chromosome applied in genetic algorithms 



jective function, which allows estimating the fitness degree of each individual 
based on its decoded (phenotype) form, representing the parameters being 
optimised. 

Thr realisation of evolutionary procedures in GAs consists in selecting 
individuals of the highest fitness degrees (as potentially final solutions of op- 
timisation) for reproduction. The set of selected individuals, referred to as 
the parents or parental poof is involved in creating (in terms of probability) 
new individuals, called the offspring, through genetic operations. There are 
two basic genetic operations: crossover and mutation. The offspring resulting 
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from these operation in one cycle make a new generation, and such evolution- 
ary cycles are repeated until the desired termination condition is satisfied. 
It can be, for instance, a limiting number of performed cycles. Encoding the 
parameters, assessing the fitness degree, selecting the individuals, and imple- 
menting the genetic operations are deliberated on in the following parts of 
this chapter. 

13.3.1. Genotype, phenotype and fitness of individuals 

To simplify the discussion, a few basic notions shall be introduced. For the 
analysed multi-objective maximisation task of (13.1)-(13.3), we define the 
co-domain of each profit function (the coordinate of the vector f{x)) as the 
set of positive real numbers 1_|_ : 

V fi{x)>0, i-l,2,...,m. (13.15) 

A vector of profit functions having that property is called a vector of the 
fitness function (Goldberg, 1989; Michalewicz, 1996) or the fitness degree. 
Haploidal individuals can be described as the following genotype vectors: 
= [ vii ... VniT, i = 1,2, . . ., V, and n e N. 

The coordinates Vki (A: = 1, 2, . . . , n), represented by a segment of the chro- 
mosome, encode the values of the coordinates Xk of the optimised parameter 
vector (13.1) of the multi-objective maximisation task (13.1)-(13.3), while N 
denotes the number of haploidal individuals (chromosomes) in the population 
V described by the following matrix: V — [ vi ... vn ]• 

Consequently, the decoded phenotype of the haploidal individual vi has the 
form Xi = [ xi- X2i ... ^ and the vector fitness degree of 

such a haploidal individual Vi is simply f{xi) G . 

13.3.2. Basic mechanisms of GAs 

A . Encoding and decoding of parameters 

To represent the sought parameters of the maximisation task (13.1) in GAs, 
hi-allele (binary) or tri-allele (trivalent) codes can be utilised (Goldberg, 
1989). 

The most convenient way of the GA coding of haploidal individu- 
als is the bi-allele (binary) code, in which the segments (coordinates 
of the chromosome of the individual V{ are finite sequences of genes 
with alleles from the set {0,1}. In the four-dimensional space of Xi = 
[ xi- X 2 i xs- X 4 . G the chromosome can assume the following 

exemplary form: Vi = [ 0100111101 0100111 00011011 10111 ]^, where 

the code sequences Vki of particular parameters may be of various lengths, 
which depend on the type of the optimisation task (the objective function 
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and its arguments, the required accuracy of computations, and the size of 
the searched space). The exemplary chromosome Vi of the length of 30 bits 
(bi- allele genes) is composed of four binary sequences, which are of the length 
of 10, 7 and 5 bits, respectively. 

The tri-allele code is usually applied to diploids, composed of chromo- 
some pairs with the alleles of the genes belonging to a three-element set, e.g., 
{- 1 , 0 , 1 }. 

In the process of decoding binary genetic information included in a chro- 
mosome into a suitable phenotype, being the vector of parameters belonging 
to the domain of optimisation, the following linear mapping is applied: 

- _ rrik 

+ A: = l,2,...,n, i = l,2,...,AT, (13.16) 

^ 3=0 

where Xki is the A:-th coordinate of the parameter vector (13.2) of the i-th 
haploidal individual, the range [xj^,Xk\ C E constitutes the domain of the 
parameter Xki , and ruk denotes the length of the sequence Vki representing 
this coordinate. 

Presently, the most prevailing genotype representation of parameters is 
of the multi-allele {floating point) type (Michalewicz, 1996). It is a natural 
way of encoding parameters belonging to the set of real numbers. In such 
a case, there is no necessity to separately encode the optimised parameters, 
because the genotype of the i-th individual is composed of one chromosome 
(vi), which can be identified with its phenotype Xii Vi = Xi ^ W . 

Thus the coordinates Vki = Xk ^ , k = 1,2, . . . ,n, describing the values of each 
optimised parameter, are not chromosome sequences, but single multi- allelic 
genes, denoting real numbers. 

B. Assessment of the fitness degree 

The vector fitness degree is the basis of the assessment of individuals in 
terms of their matching the requirements posed by the objective function. In 
Section 13.2 the task of the multi-objective maximisation of profit functions 
has been defined. Nevertheless, in general, the coordinates of the objective- 
function vector can be both profit and cost functions. Moreover, the values of 
the partial objective functions may be positive and negative. In such cases, it 
is necessary to transform the objective functions into positive profit functions 
(13.15) as follows: 

a) for the real cost functions Qi{x): 



fi{x) = C7max - 9i(x), (13.17) 

where the coefficient Cmax is not smaller than the maximal value of the 
function gi{x) obtained from among all individuals of the current population; 
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b) for the real profit functions Ui{x)\ 

fi{x) = Ui{x) - Cmin, (13.18) 

where Cmin denotes a value that is not greater than the minimal value of 
the function Ui{x) within the population. 

C. Selection of individuals 

The concept of selecting individuals simulates the mechanism of striving for 
survival in nature. We expect that individuals with the highest degrees of 
fitness obtain numerous progeny. As a result, these individuals multiply their 
genetic material and, in this way, increase the probability of their penetrat- 
ing into next generation (survival). On the other hand, individuals of the 
lowest fitness should be eliminated from procreation. Amongst the methods 
of selection (Goldberg, 1989; Michalewicz, 1996), the distribution method 
and the proportional method with the stochastic-remainder choice can be 
distinguished. 

In the distribution method we simulate the process of turning a roulette 
wheel, whose angular sectors are proportional to the scalar ranks of the 
analysed individuals, which are derived from their Pareto-optimality (Sec- 
tion 13.2). The selection of individuals for the parental pool is based on the 
settlement of the position of the roulette wheel being Turned’ A’-times, where 
N denotes the assumed number of individuals in the population. The indi- 
viduals of the higher fitness are allotted wider sectors on the roulette wheel; 
thus they have a greater chance to introduce their progeny into the next 
generation. Procedure 13.2 presented below describes a method of selection 
based on the numerical shaping of the probability distribution by means of 
the distribution function. 

Procedure 13.2. 

1) Assign to each individual in the population a relative fitness, conditioned 
by their ranks: 

Ps{xi)= , i = l,2,...,7V. (13.19) 

E 

i=l 

2) Determine the distribution function for a fixed sequence of individuals: 

i 

Q{xi) = '^Psixj). (13.20) 

3) Perform N times the ‘turning’ of the simulated ‘roulette wheel’ by 

a) generating a random number r E [0, 1], 

b) selecting an individual which satisfies the inequality r < q{xi), 

c) copying the selected individual Xi to the parental pool. 
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The proportional method with the stochastic remainder choice consists in 
calculating the expected number of copies of each individual from its relative 
rank-conditioned fitness. The integer part of the number denotes the amount 
of copies of the given individual, which is directly included in the parental 
pool, whereas the fractional part of the expected number of copies is utilised 
to define the respective angular sector of the roulette wheel used in the dis- 
tribution function method given above. This method allows us to suitably 
complement the parental pool with additional individuals. The proportional 
method with the stochastic remaider choice (Goldberg, 1989) is presented in 
Procedure 13.3 below. 

Procedure 13.3. 

1) Assign the expected number of individuals copies in the population: 

e{xi)= N, i = l,2,...,N. (13.21) 

E Pi^i) 

i=l 

2) Copy Nint = individuals to the parental pool according to 

the integer parts of the numbers e{xi), and define the number of ^vacan- 
cies ’ N =: N — N[nt • 

3) Assign the distribution function of individuals according to the fractional 
part of e{xi): 

i 

Q{^i) — ~ (13.22) 

4) Perform N -times the ‘turning’ of the simulated ‘roulette wheel’ by 

a) generating a random number r G [0,1], 

b) selecting an individual that fulfils the condition 

c) copying the selected individual X{ to the parental pool. 

D, Genetic operations: crossover and mutation 

Crossover and mutation are the basic genetic operations carried out on the 
individuals of the parental pool (Goldberg, 1989; Michalewicz, 1996). It is 
worth noticing that within these genetic operations the whole chromosome 
of an individual is treated as a single encoded sequence, which is composed of 
particular encoded segments of the chromosome (representing the coordinates 
of the optimised vector), whereas with the floating-point encoding (multi- 
alleles in evolutionary algorithms), the genetic operations are carried out 
separately for each gene (parameter). 
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The operation of crossover is analogous to re-combination between the 
DNA threads of the chromosomes of a homologous pair (Berg and Singer, 
1997). Thus, in genetic algorithms, crossover is the exchange of genetic mate- 
rial (a part of chromosome) between two parents. As a result, two descendants 
are obtained, which constitute new potential solutions. This operation pro- 
ceeds in two phases: first, individuals from the parental pool are randomly 
paired, and second, each of the parental pair is submitted for the process of ex- 
changing its genetic material (i.e., encoded sequences of chromosomes), which 
is performed with a definite crossover probability pc (usually pc G [0.6,1]). 

Crossover can be performed using the one-point or the multi-point (usu- 
ally two- or three-poirit) method. 

One-point crossover starts with choosing with an identical probability 
a crossing point the point of the division of the chromosome into two 
parts) from among m — 1 positions in the chromosome, where m denotes 
the chromosome length. Next, the genetic material defined by the crossing 
point and the end of the chromosome (m) is interchanged between the two 
parental individuals, which results in new individuals called the offspring^ as 
shown in Fig. 13.4(a). 



crossover point 



parents 




crossover points 





parents 




I 




(a) (b) 

Fig. 13.4. Two schemes of crossover: (a) one-point crossover; 

(b) three-point crossover 

In the case of multi-point crossover^ the process of genetic exchange is 
performed between several segments of the chromosomes defined by consec- 
utive crossing points, generated randomly with an identical probability. An 
exemple of three-point crossover is illustrated in Fig. 13.4(b). 

With floating-point representation, the so-called arithmetical and mixed 
crossovers are usually implemented (Michalewicz, 1996). 
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An essential feature of both schemes of crossover is that the newly cre- 
ated offspring represent admissible solutions, which belong to the domain 
determined by the parameter range and other linear constraints imposed on 
the parameters. 

Arithmetical crossover is achieved by a normalised linear combina- 
tion of two vectors representing parental multi-allele chromosomes (pheno- 
types/genotypes). If the parental pair 



Vr 




iT 



V 



Ur 



and Vs 




V 



ris 



T 



have been chosen for crossover, then their offsprings and (r, 5 G 
{1,2,..., N}) are determined according to the following formulae: 

v'j, = avr + (1 - cl)vs, (13.24) 

v'g = avs + (1 - a)vr, (13.25) 



where a G [0, 1] is a real number randomly generated (different for each pair 
of parents). 

Mixed crossover (called simple according to Michalewicz, 1996) is a com- 
position of one-point crossover and arithmetical one. Namely, for the pair of 
chromosomes Vr and Vg and the fc-th coordinate randomly selected, the 
offspring result from the following: 



= Vi^ ... Vk, (aUfc+l, + (1 - a)VkJrlr) • • • + (1 - o)Vn, )]^, (13.26) 

= [^^1. • • • {avk+u + (1 - + (1 - > (13-27) 



where a G [0, 1] is defined as above. Similarly to arithmetical crossover, 
mixed crossover ensures the admissibility of the solutions, meaning the newly 
created offspring. 

Mutation introduces a random perturbation into the newly generated 
offspring by changing the alleles of genes (of all individuals in the population) 
in a full or limited scope: 

• each gene (of the bi- or multi-allele type) undergoes mutation with an 
assumed probability Prm 

• only a limited number of genes (selected with the use of a fixed, usually 
uniform probability distribution) are randomly modified. 

Taking into account the object of this modification, two kinds of mu- 
tation can be distinguished: genotype {genetic) and phenotype {parametric) 
mutation. 
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Genotype mutation, concerning binary sequences, means a random nega- 
tion of the allele of the selected gene (in a full or limited scope). Such a kind 
of mutation of a ha,ploidal individual is illustrated in Fig. 13.5 for a binary 
chromosome code. 



negation ] ^ 



individual 



mutant 



Fig. 13.5. Exemplary genotype mutation of a binary sequence 



In the case of floating-point representation, we implement phenotype mu- 
tation, which. can be interpreted as a direct change of selected parameters. 
Like in arithmetical and mixed crossover, phenotype mutation ensures gen- 
erating permissible solutions. This kind of mutation can be performed as a 
value-uniform change or a value-nonuniform change (Michalewicz, 1996). 

The value-uniform change means that the selected parameter Vk^ of 
the individual Vq E is assigned a new value , which is randomly 
found in the domain of the k-th mutated parameter according to a uni- 
form probability distribution. Hence, we obtain a new phenotype v'^ = 

[ viq V 2 q ••• Vk ••• '^riq ] ^ • The adj ectivc ‘umform ’ does not concem 
only the uniformity of the probability distribution, but also the operation of 
mutation, which is realised independently of the number of iterations exe- 
cuted in the evolutionary process of optimisation. 

With the value-nonuniform change of the value of the selected parame- 
ter, we enforce the convergence of the mutated parameters by narrowing the 
admissible range of uniform changes according to t, the (increasing) number 
of executed iterations. The selected parameter Vk^ of the individual Vq G 
obtains then the following value: 



Vk^+ A(t,Vk-Vk^) (A) 
Vk, - A(t,Vk^ -v_k) (B) 



A{t, Au) = r Av 



1 



t 

tp 



(13.28) 



where [vj^,Vk] C M describes the original domain of the mutated parameter 
Vkq, t denotes the generation number, tp is a fixed number of all iterations 
(generations), 6 represents a coefficient of heterogeneity (usually, b — 2), 
while r G [0,1] is a random value. Additionally, the choice between the 
mutation methods (A or B) is performed randomly, with the probability 0.5. 
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E. Replacement strategies 

According to the principle of GAs, a newly generated population substitutes 
the former one. The process is repeated until the desired termination condi- 
tion is satisfied (for instance, when the assumed number of evolution cycles 
has been performed). Two replacement strategies referred to as full or partial 
reproductions are presented in (Goldberg, 1989; Michalewicz, 1996). 

The strategy of full reproduction simply means that the newly created 
population (with N individuals) replaces the entire previous population. 

The strategy using partial reproduction consists in replacing merely the 
worst individuals, having the lowest fitness degrees, of the former population, 
by the newly created descendants (which can be interpreted in terms of multi- 
generation progeny) . 



13.3.3. Genetic niching 

The process of optimal selection is the main result of the rivalry between 
different individuals or species in nature. It rises a hope for the survival 
of particular individuals or species, and the entire population. The selected 
individuals of a high fitness have greater chances of producing offspring of 
desirable features, similar to the parental ones. Moreover, such a continual se- 
lection process leads to individuals of improved suitability. Quite surprisingly, 
nature sometimes admits (with a small probability) the survival of individ- 
uals weakly fitted. Such individuals establish the source of the diversity of 
the population, which allows introducing some innovations into the genetic 
information transmitted to new generations (a kind of strange, seemingly 
stochastic testing non-optimal possibilities). In nature the time for protect- 
ing weak individuals is usually short, and, as a result, they quickly disappear. 
In order to sustain them, such weak species should be bred in an “artificial 
environment”. 

By analogy to the above, niching in genetic algorithms is a mechanism 
that preserves (apart from the best individuals in terms of fitness) also av- 
erage and weak individuals in order to sustain diverse generations (Gold- 
berg, 1989; Kowalczuk et a/., 1999; Kowalczuk and Bialaszewski, 2000). Cer- 
tainly, these individuals have to be given a chance to relay their genetic codes 
into their offspring. The mechanism balances the populations of the existing 
species by increasing the chance of mating for the individuals from sparse 
(weaker) species/niches and decreasing that chance for the ones from dense 
niches. In effect, such a mechanism of ‘uniform breeding’ prevents GAs from 
premature convergence, as well as supports their ability to adapt. 

The niche is a finite ‘ball’ in the space of parameters, in which at least 
one individual is situated. It is assumed that geometrically close individuals 
have also similar characteristics with respect to the degree of fitness. Thus 
they can be recognised as species of a distinct sort. The degree of kinship 
between two individuals can be represented by the closeness function {sharing 
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function] Goldberg, 1989) of the geometrical distance between them, which 
takes its values from the range [0, 1]. The zero value of the closeness function 
means that the two individuals are not related, i.e., they do not belong to 
one species, while the unity value denotes their closest relation. 

In particular, the niche identifying a species of ‘bred’ individuals is 
defined as a hyperellipsoid in the parameter space. The closeness func- 
tions for the individuals Vi and Vj of the phenotypes Xi^ xj G M^, 
{i,j == 1, 2, . . . , A^), respectively, can be expressed as 



{ 1 - \\xi - Xj\\p if 0 < \\xi - Xj\\p < 1, 

JUP -II jup 

0 if \\xi-Xj\\p>l, 

where 

\\xi - XjWp = \J {Xi - Xj)'^P^^{xi - Xj), (13.30) 

P = diag { 01 02 ... ct>n}, (13.31) 

while (/)k{k = 1,2, ... ,n) is the A:-th diameter of the hyperellipsoid centred 
on the i-th individual. In particular, this diameter can be determined as 
follows: 

Ak 

= k^l,2,...,n, (13.32) 

where Ak is the width of the real interval of the A:-th parameter, while e 
represents a real positive factor. 

In general, the niching technique consists in modifying the magnitude 
of the fitness degree vector or the scalar rank (related to P-optimality) of 
each individual in its ‘own’ niche according to the following (Goldberg, 1989; 
Kowalczuk et al, 1999; Kowalczuk and Bialaszewski, 2000): 



= or = (13.33) 

i=l j=l 



where f{xi) is the vector fitness and p{xi) is the scalar rank of the i-th 
individual, while f{xi) and p{xi) denote its corresponding ‘niche-adjusted’ 
fitness and rank, respectively. The sums in both denominators of (13.33) 
concern the set of individuals in the dynamically determined niche centred 
on the i-th individual (phenotype). If the individual is the only member of 
its own niche, then its fitness degree is not decreased, as Yl^ij == 1- In other 
cases, the fitness degree is decreased according to the number of neighbours 
in the niche. 



Example 13.5. Let us consider a two-dimensional searched space. Fig. 13.6 
depicts the two-dimensional cube of the sought parameters X\ G [xi,xi] and 
^2 € [^ 2 ? ^ 2 ]- The exploration ranges are Ai = \xi — x^l and A 2 — \x 2 — ^ 2 1* 
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The searched space has been divided into nine equal parts (s — 3). In effect, 
each individual gets its own niche in the form of an ellipse, as shown in 
Fig. 13.7, with the diameters Ai/3 and A 2 / 3 . 




Fig. 13.6. Division of a two-dimensional cube (domain) of the sought parameters 




Fig. 13.7. Ellipsoidal niches 



Example 13.6. Consider, for illustrative purposes, an exemplary process of 
niching a two-dimensional vector of the fitness degree. The arrangement of 
12 individuals in a two-dimensional parameter space is shown in Fig. 13.8(a), 
with their corresponding vector of fitness given in Fig. 13.8(b). Note that a 
single Pareto- optimal solution is marked with a ‘star\ For this individual its 
own niche with the radii 8/6 and 7/6 is also marked. Figure 13.8(c) presents 
the niche- adjusted fitness vectors of individuals prepared for the ranking se- 
lection. As a result of the niching mechanism, the fitness degree of ten in- 
dividuals (the star and the circles) have been decreased (see the arrows) and 
the fitness of two individuals (the cross and one dot) remain intact. 

As has been declared in (13.33), niching can concern either the fitness 
or rank of the individuals of the analysed population. The diagrams of both 
algorithms are shown in Fig. 13.9. On the basis of the performed experiments 
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Fig. 13.8. Exemplary niching mechanism: (a) the population 
and the ellipse niche of the optimal solution; (b) the true fitness 
amongst the population; (c) the niche-adjusted fitness 



FITNESS ^ ^ FITNESS ^ 




MODIFIED FITNESS 









RANKING 







RANKS ^ 





□ 






RANKING 







^ RANKS ^ 




SELECTION 




SELECTION 


OF INDIVIDUALS 




OF INDIVIDUALS 


TO THE PARENTAL POOL 




TO THE PARENTAL POOL 



(a) (b) 

Fig. 13.9. Niching methods: (a) fitness of individuals (NF); 
(b) ranks of individuals (NR) 



(Kowalczuk and Bialaszewski, 2000) we can characterise the niching of ranks 
as a “stronger” mechanism. This effect immediately results from a direct 
modification of the ranks and direct warping of the processed domination 
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structure (degree) in the Pareto sense, while, on the contrary, the niching 
of fitness constitutes only a source of an indirect modification of the ranks 
(changing the fitness vector need not alter its level of domination) . Analogous 
types of the niching mechanism can be considered with respect to the selected 
parental pool (Kowalczuk and Bialaszewski, 2000), as explained in Fig. 13.10. 




(a) (b) 

Fig. 13.10. Methods of the niching of (a) the fitness 
(NFP) and (b) the ranks (NRP) of the parents 

As the niching operation allows both increasing the probability of sur- 
vival (i.e., appearance in the next generation) for species of ‘sparse’ niches 
and decreasing it for ‘dense’ niches, the mechanism has the nature of ‘uniform 
breeding’. Nevertheless, it is important that in spite of uniform breeding, as 




13. Genetic algorithms in the multi-objective optimisation of fault. . . 



533 



a global effect of genetic expansion and selection procedures, we observe that 
there are constant densities sustained in certain niches. This can eventually 
be interpreted in terms of their robustness to changes in the fitness measure. 

13.3.4. Full cycle of the genetic algorithm with niching 

Once we have introduced the genetic notions and definitions, the full cycle of 
the genetic binary-code algorithm with the niching mechanism is presented 
as the form of Procedure 13.4. 

Procedure 13.4. 

1) Generate randomly (with a uniform distribution) an initial popula- 
tion of N individuals V — [ v\ V 2 ... vn ], where V{ = 

[ vii V 2 i ... Vm i = 1,2, . . . ,iV, with usually N G [50,100], is 
the i-th haploidal individual (chromosome) in the population V, while 
Vf., (^k = 1, 2, . . . , n) denotes a random binary sequence (a segment of the 
chromosome) of a fixed length which encodes the k-th optimised parame- 
ter. 

2) Decode each genotype of the haploidal individual into its phenotype: X{ = 
[ xi- X 2 i ... Xm V, where 

Xki =Xj, + 2^, /c = 1,2, . . . ,n, 

j=o 

is the k-th coordinate of the phenotype Xi, the interval [xj^.)Xk\ C M 
denotes its domain, while ruk represents the length of the sequence Vki 
encoding the coordinate Xki . 

3) Compute the fitness degree vector: 

f{xi)=[fi{xi) fi{xi) ... /m(a:<)]e®“, 

a) if the l-th coordinate fi{xi), I — 1, 2, . . . ,m, is a cost function, then 
apply the following mapping: 

fl{Xi) — C^max fl{Xi), 

where Cmax denotes the maximal value of fi{xi) in the present pop- 
ulation, 

b) if the l-th coordinate fi{xi) is the profit function, then the following 
transformation is applied: 

fl{x^i) — fl{Xi) ^miri5 



where Cmin stands for the minimal value of fi{xi) in the current 
population. 
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4) If the assumed number of generations (iterations) has been obtained, then 
finish the algorithm and select the P-optimal individuals (the primary 
Pareto- front) from the current population, otherwise continue the pro- 
cedure from Step 5. 



5) Carry out the niching of individuals with respect to their fitness degree as 
follows: 

f{xi) 



where 



f{xi) = 



N ' 

J=1 



6ij — 






3 lip 



0 if 

\Xi i^Xi Xj ) , 

P = diag| 01 02 <f>n], 



while (j)k{k = 1,2, ... ,n) is the k-th diameter of the hyperellipsoid centred 
on the i-th individual, which can be set according to the formula (j)k — 
Ak/s, with a non-zero range of the optimised k-th parameter and a 
chosen factor e. 



6 ) 



Assign each individual a scalar rank p{xi) by ranking them in terms of 
P- optimality: 

Pip^i) — A^max P'ip^i) 4 " Ij 



where 

Mmax = max p{xi), 

i=l,2,...,AT 

while p{xi) is the degree of domination (the number of individuals which 
dominate over X{ in the Pareto sense), and //max denotes the maximum 
value p{xi) in the population. 

7) Choose the parental pool Vp from the population V of individuals ac- 
cording to the proportional method with the stochastic-remainder choice: 

a) determine the excepted number of individuals copies in the population: 



e{xi) 



P{Xj) 

N 

E pi^i) 

i=l 



iv, 



= 1,2,. ..,N, 



b) copy iVint = individuals to the parental pool based on 

the integer parts of the number e{xi), and define the number of ‘va- 
cancies’ N — N — Nint, 
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c) set the distribution function of individuals according to the fractional 
part of e(xi): 

i 

Qi^i) = XI 

j = l 

d) perform N -times the ‘turning’ of the simulated ‘roulette wheel’ by 

• generating a random number r G [0, 1], 

• selecting the individual that fulfils the condition 

- q{xMy 

• copying the selected individual to the parental pool. 

8) Create the offspring V' from the parental pool by 

a) generating pairs of parents, and performing their one-point crossover 
with a fixed crossover probability pc G [0.6, 1] via 

• drawing a number r G [0,1] of a uniform distribution (different 
for each pair), 

• crossing the pair if r < Pc by 

- generating a crossover point jc, which belongs to the discrete 
range [l,m — 1], where m is the length of the chromosome, 

- exchanging the parts of parental chromosomes defined by the 
{jc + l)-th bit and the m-th bit, 

b) performing binary mutation with a fixed probability via 

• generating for each gene a number r G [0, 1] of a uniform distri- 
bution, 

• mutating if r < Pm, that is, negating this bit (usually Pm is in- 
versely proportional to the power of the population set). 

9) Replace the ‘old’ population by the newly formed population of the off- 
spring V' according to the full reproduction strategy, i.e., V i- V' , and 
return to Step 2). 

Procedure 13.4'. 

In the case of floating-point (multi- allele) parameter representation. Steps 1 
and 2 in Procedure 13.4 reduced to the following form: 

1) Randomly generate the uniformly distributed initial population of N indi- 
viduals: V = [ vi V 2 . • . vj\f ]} where Vi = [ vi- V 2 i ... Vm 
i = 1,2, . . . , N , for N G [50, 100], is the i-th haploidal individual in the 
population V, while Vk^ {k = 1,2, ... ,n) denotes a random value gener- 
ated from the range of the k-th parameter, that is, Vki ^ [ Xj^ Xk ] (as 
Vi = Xi e ). 
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Moreover, with floating-point representation, Step 8 responsible for the ac- 
complishment of genetic operations also has to be modified as follows: 

8) Create the offspring V' from the parental pool by 

a) generating the pairs of parents Vr and Vg (r, s = {1, 2, . . . , N}), and 
performing for each pair (vr, Vg ) arithmetical crossover with a fixed 
probability pc G [0.6, 1] via 

• drawing a number a G [0, 1] of a uniform distribution, 

• crossing the pair according to the following formula: 

v'^ — avr + (1 — o)Vg and = aVg + (1 — a)vr, 

b) performing the phenotype mutation of all individuals Vq G 

(g G {1,2,..., N}) with a fixed probability, done by the value-uniform 
change of a selected parameter Vk^ to a new random value G 

[ ^ ^ from the domain of this parameter based on a uni- 

form distribution. Hence, the new individual can be described as 

v', = [ Vi^ V2, ... ... Vn, ] . 



13.4. Genetic algorithms in the multi-objective optimisation 
of detection observers 

To illustrate the applicability of the proposed approach, in this section we 
shall consider the problem of designing linear state observers used in fault- 
detective systems. This issue shall be discussed based on two examples con- 
cerning the control system of a remotely piloted aircraft and a ship propulsion 
system. 

13.4.1. State observers in FDI systems 

Fault Detection and Isolation (FDI) systems are used for diagnostic purposes. 
Such systems are founded on two principal operations. The first of them con- 
sists in detecting the occurrence of a fault (defect), while the second one tries 
to isolate a particular fault from others. Such systems ensure a reliable op- 
eration of other engineering assemblages for signal measurement as well as 
system monitoring and control, for instance. This issue is of special impor- 
tance in systems of high safety (Patton et al, 1987; Chen et al, 1996; Gertler 
and Kowalczuk, 1997). The presence of errors in system components may be 
disagreeable, or even dangerous. Sometimes, after a certain elapse of time, 
even small system errors can have a serious eflFect on system performance. 
Therefore, the detection and isolation of faults should be done as early as 
possible, so as to allow a human operator to take appropriate steps. 
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The concept of FDI, based on mathematical models of the monitored 
object (see Chapter 5), is founded on a current comparison of measurements 
of the plant with certain signals predicted based on the object’s model. The 
differences between the corresponding signals, called residues or residuals, 
allow identifying the existing failures, faults or defects of the system. It is 
assumed that those differences are, in general, influenced by disturbances, 
noise and modelling errors. Fault detection is achieved by an appropriate 
filtration of these residues, and principal diagnostic decisions are also made on 
the basis of their appropriate evaluation. The scheme of a diagnostic system 
founded on a state observer and an additional filter and referred to as a 
residue generator is shown in Fig. 13.11. 



external disturbances d(t) 



control signals u(r) 



; i i- 



faults f(^) 



I measurement noise \(t) and system noise w (r) 



PLANT 



measurements y(t) 



OBSERVER 



modelling errors 






estimate of the plant output 






RESIDUE GENERATOR 



FILTER 



residue r(/) 



Fig. 13.11. General scheme of FDI systems 



13.4.2. Design of residue generators 

Consider the following mathematical description of the monitored system: 

x{t) — Ax{t) Bu{t) + Nd{t) + Fi f{t) (13.34) 

y{t) - Cx{t) Du{t) ^ F 2 f{t) + v{t), (13.35) 

where x{t) G denotes a state vector, u(t) G W is a control vector, 

y{t) e stands for a measurement vector, f{t) G W denotes a fault 

vector, d{t) G is a state disturbance vector, while the signals w{t) E 
W and v{t) G have a noisy character. The matrices appearing in the 
model (13.34)-(13.35) have suitable dimensions: A G , B G 
C e D e Fi G and F 2 G It is presumed 

that the pair (A, (7) is completely observable. It is thus postulated that 
the fault f(t) is represented by an unknown vector time function and that 
the influence of this fault on the state evolution of the system considered 
and on the measurement signals is conditioned by the choice of the matrices 




538 



Z, Kowalczuk and T. Bialaszewski 



Fi and F 2 , respectively. Considering, for example, a simple scheme with 
actuator faults attributed to the control channel, we assume that Fi = B 
and F 2 — while in the case of sensor faults associated with the observation 
channel we have Fi = O and F 2 — I m- 

Assuming that the signals are unknown, the state observer can be ex- 
pressed in the following form (e.g., Brogan, 1991; Chen et a/., 1996; Suchomski 
and Kowalczuk, 1999): 

k{t) - (A - KC)x{t) + (^ - KD)u{t) -f Ky{t), (13.36) 

y(t) — Cx{t) + Du{t), (13.37) 

where x{t) G is a state- vector estimation, y{t) G constitutes an 
estimated system output, while K G stands for a matrix observer 

gain. Consequently, the residual signal r{t) G E’’ can be obtained from the 
following residual equation: 

rit)=Q{y{t)-y{t)), (13-38) 

where a matrix Q G E^^"^ of weights serves as a ‘free’ design parameter. 
The residue generator described by (13.36)-(13.38), which detects faults in 
the plant modelled by the state equations (13.34)-(13.35), is presented in 
Fig. 13.12. 




Fig. 13.12. Residue generation based on state space equations 

The evolution of the state estimation error 

e{t) = x{t) — x{t), e{t) G E^, (13.39) 

can be described by the “internal form” equation, conditioned by faults and 
disturbances: 

e{t) = {A-KC)e{t)-\-{Fi-KF 2 )f{t)-\-Nd{t)+w{t)-Kv{t). (13.40) 
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Thus, for an asymptotically stable homogeneous error equation, all eigen- 
values of {A — KC) must have negative real parts. It can be easily shown 
{ibidem) that the residual vector r{t) of (13.38) can be interpreted as the ob- 
servation of the state estimation error e{t) in the presence of the perturbing 
signals f{t) and v{t) which can be expressed as 

r{t) = QCe{t) + QF 2 /W + (13.41) 

The solution of (13.40) can be shown in the Laplace domain as 
E{s) = [sin -{A- KC)]-^ {(Fi - KF 2 ) F[s) 

+ ND{s) + W{s) + KV{s)+e{0)Y (13.42) 



where F{s), D{s), W(s) and V (s) are Laplace transforms of the corre- 
sponding signals, while e(0) denotes an initial value of the state estimation 
error. As a result, the residue has the following Laplace form (Chen et a/., 
1996; Kowalczuk and Suchomski, 1998; Kowalczuk and Bialaszewski, 1999; 
Kowalczuk et al, 1999; 1999b; 1999c): 

R{s) = Grf{s) F{s) + Grd{s) D{s) + Grnj{s)W{s) 

+ G^y.^;(s) V(s) + Gr 7 . 0 ( 5 ) ^( 0)5 (13.43) 

where the matrix transfer functions are as follows: 

Grf(s) = Q{C [sin -{A- KC)]-^ (Fi - KF2) + F2}, (13.44) 



Grdis) = QC [sin -{A- KC)]-^ N, (13.45) 

Gru,(s) = QC [sin -(A- KC)]-^ , (13.46) 

Gr,{s) = Q{Im - c [sin -{A- KC)]-^ K} , (13.47) 

Gre{s) = QC [sin -{A- KC)]-^ = G,^(s). (13.48) 



The above matrix transfer functions describe the influence of all critical fac- 
tors: faults, initial conditions of the state estimation process, external distur- 
bances and/or modelling uncertainties. The steady-state value r(oo) of the 
residual vector is 

r(oo) = Grf{0) /(oo) + Grd{0) d{oo) -f G7.7 t;(0) w{oo) + Grv{0) v{oo), 

(13.49) 
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where 



Gr/(0) = Q[C {A - KCy^ {KF2 - Fi) + F2], (13.50) 

GrdiO) = - QC{A - KCy'^N, (13.51) 

Gr^Q) = -QC{A- KC)-'^, (13.52) 

G^„(0) = Q[Im + C{A - KC)-^K]. (13.53) 



As the matrices K and Q constitute the underlying parameterisation 
of the designed detector, it is necessary to choose the values of the entries of 
those matrices such that they will emphasise the influence of F{s) on R{s) 
and, at the same time, restrict the impact of the remaining factors on R{s). 
In order to deflne such tasks of parametric optimisation, let us consider the 
following weighted partial-objective functions in the whole frequency domain 
(different from Chen et al, 1996): 



Ji{K,Q) = l|Wi(s)G,/(s)|U, 


(13.54) 


J2{K,Q) = 11^2(5) Grd(s)l|oo, 


(13.55) 


Jz{K,Q) = 11^3(5) G,„(s)||oo, 


(13.56) 


J 4 {K,Q) = 11^4(5) G.„(s)||oo, 


(13.57) 


MK) = \\{A-KCr%, 


(13.58) 


Je(K) = \\{A-KC)-^K\U, 


(13.59) 


with the following matrix norms: 


||M(s)||oo = super [M(jw)], 

UJ 


(13.60) 


||M|U = a[M], 


(13.61) 



where a [•] is the maximum singular value of the matrix argument. 

The weighting matrix functions VTi(s), W 2 {s), Ws{s) and 1^4(5), 
which represent the prior knowledge about the spectral properties of the 
process, introduce additional degrees of freedom of the detector design pro- 
cedure. Those matrices permit a spectral separation of the effects of faults 
and noise. In order to maximise the influence of faults at low frequencies and 
minimise the noise effect at high frequencies, the matrix function 
should have a low-pass property. The weighting function W 2 {s) should have 
the same properties, while the spectral effect of ^^ 3 ( 5 ) and W 4 {s) should 
be opposite to that of Wi{s). 

Once we have flxed the weighting matrices ^^ 1 ( 5 ), VF 2 (s), Ws{s) and 
^ 4 ( 5 ) 1 ^ 3 ( 8 ), the synthesis of the detection Alter boils down to the issue 
of the multi-objective optimisation of the pair {K, Q) G x with 
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regard to the goal expressed by 



opt J{K^ Q) — 
(K,Q) 



It is clear that with the assumed specti [A — KC] C C_, the set of 
complex values with a negative real part, the inverse matrix [A — KC]~^ 
exists. The profit index J\ {K, Q) constitutes the main maximised criterion 
(with a hope for a similar beneficial effect on the lower bounds on the min- 
imum singular values), while the cost functions J 2 {K,Q), Js{K,Q) and 
J 4 {K,Q) account for the state and output disturbance and noise effects. 
The cost functions J^iK) and Jq{K), describing the influence of static de- 
viations from the nominal model of the plant, represent important explicit 
robustness measures. 

The selection of the observer gain K can be performed in several ways. 
For example, the method of the eigenstructure assignment of the observation 
system matrix (A — KC) or a method based on the Kalman-Bucy filtering 
can be applied (in the latter case, the knowledge of covariance characteris- 
tics of noise perturbations in the analyse model is necessary) . In this chapter 
we follow the first approach (Chen et al, 1996), in which a whole spectrum 
(eigenvalues ) of the observation system {A — KC) is placed in the re- 
quired region of the complex plane, while assuring the necessary robustness of 
this placement to the deviations (AA, AC) from the nominal plant model. 

We have to emphasise that in the ‘original’ FDI design problem the issue 
of structural synthesis is not ‘complete’ - in the sense that only a part of the 
design freedom ‘supplied’ by the matrix K is utilised during the design 
process. Therefore, the spectral synthesis of the matrix {A — KC) should 
incorporate the additional task of the robust stabilisation of the observer 
(which can be accomplished by genetic means) by considering a suitably 
parameterised family of the pairs {(A + AA,C + AC)}, which that map 
the uncertainty of modelling (confer Chapter 6). 

Chen et al (1996) utilised the method of sequential inequalities (Zakian 
and Al-Naib, 1973) in a multi-objective optimisation procedure performed 
by GAs. In their approach, the cost indices are expressed in the frequency 
domain and transformed into a set of inequality constraints, which are tested 
for a finite set of frequencies. Eventually, the authors applied a genetic al- 
gorithm to search for optimal solutions satisfying all inequality constraints. 
In order to get an appropriate parameterisation of the gain matrix K, the 



max Ji (K, Q) 
(K,Q) 

min J 2 {K,Q) 
(K,Q) 

min Js{K,Q) 
(K,Q) 

min J/^{K,Q) 
(K,Q) 

min J^(K) 

K 

min Jq(K) 

K ^ ^ 



(13.62) 
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eigenstructure assignment approach (Liu and Patton, 1994; Chen et al, 1996) 
was employed. 

In our approach the analysed multi-objective optimisation problem is 
solved by a method that incorporates both Pareto-optimality and the ge- 
netic search in the whole frequency domain. In particular, the design of 
residue generators is based on the optimisation of the objective function 
J{K, Q) of (13.62), whose coordinates are partial objectives: the profit func- 
tion Ji{K, Q) and the cost functions Ji{K, Q),i = 2, 3, 4, 5, 6. The ranking 
method derived from P-optimality is employed to assess the P-optimal solu- 
tions of this task generated by the genetic algorithm operating on multi-allele 
codes (see Subsection 13.3.2). 

To assure that genetic optimisation yields exclusively permissible solu- 
tions, spectr [A — KC] C C, we directly search only for eigenvalues (and 
not for the observer gain K itself), on the basis of which the observer gain 
matrix is calculated by means of the pole placement method (Kowalczuk et 
al, 1999a). 

The problem of multi-objective optimisation is thus reduced to the fol- 
lowing task: 



opt J{K^Q) = opt J{K) — opt JiK{\))^ (13.63) 

{K,Q) K A 

where A C is the sought n-element vector of the eigenvalues of the matrix 

{A-KC). 

In the approach applied, we presume that the coordinates of the i-th 
individual Vi in the population V express the real or imaginary parts of the 
k-th. eigenvalue, respectively. 



>^ki =Vki-\-i u/., (13.64) 

=Vki -i vii, (13.65) 

where Xk^ and A/, denote the k-th. and l-th eigenvalues of the composed 

vector A, while and u/. represent the two coordinates of the i-th 

individual Vi. 



13.4.3. Detection observers for the lateral control system 
of a remotely piloted aircraft 

As a first example of the application of the optimisation algorithm, let us 
consider the task of the synthesis of a detection observer for the linearised 
lateral control system of a remotely piloted aircraft. The basic aviation pa- 
rameters of the object considered are presented in Fig. 13.13, where three 
axes, X, y, z, are distinguished. According to these axes we distinguish three 
variables: 0 as the bank angle round the axis x, '0 as the slope angle round 
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rudder 




Fig. 13.13. Aircraft and its aviation parameters 



the axis y, and ( as the directional angle round the axis z. The following 
useful relationships are accomplished: 





dt’ 



where a denotes a slide, which means the state of the aircraft flight evoked 
by the velocity component (which brings about an increase in the falling 
velocity and enables the aircraft to counteract a drift towards the ground 
while flying at a cross-wind), j3 represents a bank rate, p is a slope rate (a 
rotation of a flying aircraft round its vertical axis z). Moreover, there are two 
control variables, the position r of the rudder (directional) and the aileron 
angle 6, associated with the aircraft actuating elements. 

The linearised model (13.34)-(13.35) of the plant (Mudge and Patton, 
1988; Kowalczuk et al, 1999a) can be following by the respective matrices of 
state equations: 
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The state vector has the form x = [ a /? p (j) '0]^, while the con- 
trol signal is represented by u = [ r 0 ]^ . 

In the example considered, we presume that the fault may occur in the 
control channel (i.e., Fi = B and F 2 — D). Moreover, in the analysed 
aircraft state model, we do not differentiate between the disturbance signals 
d{t) and the output noises w{t)\ instead, they are jointly modelled as one 
signal d{t). Therefore the coordinate J^{K,Q) does not appear in the ob- 
jective function vector applied (13.62), and in the search for the P-optimal 
detection observers we limit our optimisation procedure to the five criteria, 
J,(K,Q), i-1,2,4,5,6. 

Diagonal matrices have been chosen to describe the weighting transfer 
functions 



W ^{s) = diag 



f (0.1s + l)(0.02s + l) ) 
t (0.005s + 1)2(0.0015+1) J 



which allow separating the consequences of faults and noise. To maximise the 
fault effects at low frequencies and to minimise the noise effects in the high 
frequency range, VFi(s) has been designed as a low-pass filter and 1^4(5) 
denotes a suitable high-pass filter. 



A. Results of genetic optimisation 

The genetically sought vector of the eigenvalues of the matrix {A — KC) 
obtains the following parameter form: 

= [ Vi- V 2 i Vz- + j VSf -jV 4 . W5. j € C? , 

where Vki is the k-th. coordinate of the vector 



Vi = 



Vu V2i 




£ ]R^ 



which represents the i-th individual (i = 1,2, . . . , N) in the population. The 
hypercube of the optimised parameters of the vector Vk has been determined 
in a five-dimensional space as follows: 



vi € [—5, —0.2] , V 2 € [—15, —3] , Vs G [—10, —2] , 

U4 G [0.2, 4] , V 5 E [—30, —8] . 

As a result of the genetic multi-objective optimisation of the objective 
function vector with the coordinates Ji{K,Q), i = 1,2,4, 5,6, a set of P- 
optimal solutions has been obtained. The distribution of two selected coor- 
dinates {vi , 1^2 ) of all P-optimal solutions in their two-dimensional subspace 
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{vi^V 2 ) is depicted in Fig. 13.14, where the numbers next to some ‘dots’ cor- 
respond to the indices of the P-optimal observers. The corresponding values 
of the two chosen objective functions (profit Ji and cost Jq ) are presented 
in Fig. 13.15 




Fig. 13.14. P-optimal solutions in the selected two-dimensional subspace 
against the niches of three chosen solutions (21, 27, and 33) 




Fig. 13.15. Selected two-objective characterisation 
of Pareto-optimal solutions 

There are illustrative dense niches (the distinguished ellipses centred 
on the 21-th, 27-th and 33-th solutions, respectively, with the radii of 0.8 
and 2) depicted in Fig. 13.14. As can be seen, with the assumed criteria of 
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constructing the niches and a relatively wide range of the searched area, we 
have obtained a limited number of ‘dominated’ niches (species). Moreover, 
the demonstrated species can be interpreted in terms of the robustness or 
‘confirmed Pareto-optimality’ of the final solutions: the individuals bred in 
these niches are immune enough to survive in the cyclic genetic-evolution 
process. 

By means of this example we also illustrate a natural jeopardy that 
is connected with Pareto-optimal solutions, which can be optimal in terms 
of certain criteria and completely unfavourable from the viewpoint of other 
objectives. By considering, for example, the analysed variant of the prob- 
lem (13.62), we observe that the non-dominated solution no. 26 shown in 
Fig. 13.15 is characterised by one of highest profits (Ji) and has, at the same 
time, very unsuitable robustness properties (Je). 

B, Verification of the optimised detector 

>From among all of the Pareto-optimal solutions, the observer no. 21, de- 
scribed by the following eigenvalues: Ai —1.583, A 2 = —10.071, A 3, 4 — 
—4.015 dz jl.480, A 5 = —21.013, has been chosen for further study. 

The unstable system under consideration was stabilised with the aid of 
a state feedback controller using the knowledge about the estimated system 
state. The first coordinate of the fault vector /(^), concerning the rudder 
position, was subject to an additive fault of a sigmoid contour (/), shown in 
Fig. 13.16. 




Fig. 13.16. Residual signals ri obtained for 
the observer gain matrix K 21 and the fault / 

Simulations were performed in the presence of system and measurement 
disturbances, d{t) and v{t)^ respectively, both modelled as zero-mean Gaus- 
sian white-noise processes. The performance of the observer based on the gain 
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matrix K 21 is illustrated in Fig. 13.16. It should be emphasised that the pre- 
sented results concern the case of an uncertain plant model since the system 
parameters of the (‘true’) plant applied were multiplicatively perturbed by a 
uniformly-distributed ±10% deviation from their nominal values. 

As can be seen from Fig. 13.16, the two fault-dependent coordinates 
(rijTs) of the residual vector r{t) demonstrate significant changes analogous 
to the generic fault signal applied. The presented observation system can thus 
be used for reliable detection of sensor faults from noisy measurements. The 
ultimate fault detection can be achieved by using, for example, appropriate 
thresholds (here we have concentrated solely on the task of designing robust 
observers). 



13.4.4. Fault detector for a ship propulsion system 

The ship propulsion system of a low-speed marine vehicle (Izadi-Zamanabadi 
and Blanke, 1998; Kowalczuk and Biataszewski, 1999) is taken as another 
object of our consideration. It consists of one engine and one propeller. This 
system is the basic mechanism of ship manoeuvring (acceleration and brak- 
ing). A failure of the propulsion unit may easily cause a dangerous event, such 
as a collision with another ship, drifting to shallows, or various financial or 
environmental losses. Such circumstances imply the necessity of monitoring 
the propulsion system and taking appropriate steps in the case of operational 
faults. 

In such systems, apart from fault detection, it is necessary to accomplish 
the isolation of faults (which is usually difficult). In particular, fault isolation 
can refer to a shaft speed sensor or the diesel engine itself. The isolation 
is necessary because each of the faults requires other steps to be taken. A 
linearised continuous-time model of the ship propulsion system is depicted in 
Fig. 13.17, where the following five blocks are distinguished: (a) the propeller- 
pitch control system, (b) the diesel engine, (c) the shaft, (d) the propeller 
and ship dynamics, and (e) the PI controller. The description of the system 
parameters is given in Table 13.1. 

The system presented in Fig. 13.17 can be described in the continuous- 
time domain by the state-space model (13.34) and (13.35) with the corre- 
sponding matrices of the following forms: 
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Table 13.1. Parameters of the diagnosed ship propulsion system 



Symbol 


Description 


e, 9m 


Propeller pitch angle and its measurement 


^ref 


Set-point for the propeller pitch angle 


A6 


Pitch-angle measurement fault 


AO 


Leakage 




Ship speed and its measurement 




Shaft speed and its measurement 


An 


Angular- velocity measurement fault 


Y 


Fuel index (level) 


Qf 


Friction torque 


Qeng 


Torque developed by the diesel engine 


Text 


External force representing the influence of wind and waves 


V0, Vn, Vu 


Measurement noise for 0, n,z/ 


M 


Shaft inertia 


m 


Ship weight 


kt 


Pitch-angle control gain 


ky 


Engine gain 




Engine time-constant 


O'O •) O'nj O'u 
^6 5 On , bu 


Parameters of the steady state 
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and the following model signals: 



X 



— ^6 n U Qeng j ) ^ ^ref ^ j 



d = [ Text ] , y = 









/ = [ A6> M An 

lT 



w = 



V =\^V0 Vn ] 



-ktV0 0 0 0 



where x is the state vector, u denotes the control vector, / stands for an 
additive fault vector, d denotes an unknown disturbance vector, y is the 
measurement vector, while the signals w and v have noisy characteristics. 
The descriptions of 6, n, z/, Qeng other elements of the model vectors 
and matrices are given in Table 13.1. 

The faults in the examined object are associated with the sensor of the 
pitch angle of the propeller (a linear potentiometer), the sensor of the angu- 
lar velocity of the shaft (a tachometer), and the diesel engine itself (Izadi- 
Zamanabadi and Blanke, 1998). 

The sensor of the pitch angle 6 of the propeller can indicate a fault 
when: 

a) generating too low a signal (with a negative deviation A6iow) due to a 
broken wire, short circuit, or shaft stack at a minimum pitch angle; 

b) generating too high a signal (with a positive deviation AOhigh) due to a 
broken wire, short circuit, or shaft stack at a maximum pitch angle. 

Another kind of faults, indirectly associated with this sensor, is hy- 
draulic leakage, which can bring about a slow change of the propeller pitch 
angle {A6). 

The tachometer can generate the following faults: 

a) generating a maximum signal value (Arimax ) due to an electromagnetic 
disturbance, 

b) generating a minimum signal value (A^min ) due to a signal fade-out in 
the converter. 



The appearance of the above faults can have serious consequences. The 
fault A^high can decrease the ship velocity (or even braking), which brings 
about the risk of manoeuvring, a delayed operation of the ship, and an in- 
crease in the operational costs. The failure A^iow brings about an increase 
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in the ship velocity, and even a danger of a collision. The fault Arimax can 
impel a decrease in the ship velocity, which results in a delayed ship operation 
during manoeuvring and increased operational costs, while on the contrary, 
the effect of the fault Arimin gives rise to an unintentional ship acceleration, 
which can bring about a collision. 



A, Results of genetic explorations 

The vector of the genetically sought eigenvalues of the matrix {A — KC) is 
expressed as 

A(ui) = [ Ui; V2i V3. Vi- j 



where Vki is the A:-th coordinate of the parameter vector 

r hT 



Vi = 



V\i V2i V3. 



e 



which is represented by the i-th individual {i = 1,2, ... ,N). The searched 
ranges of the optimised parameters Vk have been established as vi G 
[-0.5, -0.1], ^;2 G [-100, -80], U 3 G [-1.2, -0.7], ^4 G [-30,-15]. 

The weighting functions have been assumed to be of the following matrix 
forms: 



= W^2(s) = diag|^ 
Wsis) = Wiis) = diag 



}■ 



01s + l)(0.05s + l) 
(0.0l5 + l)(0.05s + l) 



(0.0015+1)2(0.000015+1)2 



which allow separating the influence of the faults and the noise. In order to 
maximise the fault effects at low frequencies and to minimise the noise results 
in the high frequency range, the matrices 1 ^ 1 ( 5 ) and W 2 {s) have been set 
as low-pass filters and the functions 1 ^ 3 ( 5 ) and W 4 {s) have been composed 
as suitable high-pass filters. 

As a result of the evolutionary search described above a set of 78 Pareto- 
optimal individuals has been obtained. In order to assess the ultimate solu- 
tions, the global optimality level rj has been utilised, which refers to the at- 
tainable maximum values of the partial criteria { J^} (see also Procedure 13.1 
and Examples 13.3 and 13.4). 

The 77 -ordered set of the Pareto-optimal solutions obtained with respect 
to an inverted optimality index (1 — rj) is depicted in Fig. 13.18. The dots 
denote particular solutions. It is clear that the ordering drastically restricts 
the amount of valuable Pareto-optimal solutions. Consequently, only the so- 
lutions with the highest optimality level (ry = 0.6), i.e. no. 35, no. 36, no. 42, 
no. 56 and no. 74, are accepted as the most useful ones. 
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1-11 



Fig. 13.18. Distribution of P-optimal solutions 
with respect to global optimality 

E. Verification of results 

Verifying simulations have been done for the observation system no. 35, char- 
acterised by P-optimal sets of eigenvalues and objectives represented by the 
following vectors: 

A(v35) =[ -0.2658 -1.0899 -21.7793 -90.0462 ]^, 

J(f£:(A(i/ 35 ))) = [ 1.2e -h 2 1.9e-6 6.2e - 2 1.3e-hl 2.8e-h8]^. 

A sequence of possible events in terms of additive faults (Izadi-Zamanabadi 
and Blanke, 1998) is depicted in Fig. 13.19. External disturbances influencing 
the object are presented in Fig. 13.20. The noise signals w{t) and v{t) 
affecting the states and measurements have been generated as a zero-mean 
Gaussian white-noise process. 

The obtained residual signals, re = 6 — 6^ Vn = n — n and = u — 0, 
are shown in Fig. 13.21. The ellipses denote the moments of the prospective 
detections of the additive faults. As can be easily seen, practically all the 
faults of Fig. 13.19 give distinctive symptoms in at least one of the residues. 
What is more, the residuals demonstrate changes analogous to the generic 
fault signal applied. It is thus apparent that with the use of appropriate 
filtration the symptom information included in the residues makes it possible 
to detect and isolate all the faults. With the assumed model of disturbances, 
the temporary pitch-sensor fault effects are less clear as compared to the 
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time (seconds) 



Fig. 13.19. Additive propeller-system faults concerning the 
pitch angle, shaft speed, and leakage 




time (seconds) 



Fig. 13.20. External disturbances representing 
the friction torque and external force 
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Fig. 13.21. Residual signals 

others. With a permanent fault not a periodic one, or with a lower level of 
the noise signals, the related diagnosability would certainly be higher. 

Additionally, a multiplicative fault related to a 20% loss of the diesel- 
engine gain ky has also been simulated at 3000 s. This, however, has not 
been detected by the system, which was based on the linear system model 
with additive faults. 



13.5. Summary 

The presented evolutionary /genetic approach to the multi-objective synthe- 
sis of detection observers appears to be a valuable designing tool of practical 
effectiveness. This approach can be used for the global multi-objective search 
in multi-dimensional parameter spaces for Pareto-optimal solutions (repre- 
senting the parameters of the detection observers, for example). The method 
is immune to the possible discontinuity or multimodality of partial objective 
functions. The ranking performed with respect to Pareto-optimality permits 
a universal estimation of the vector objective functions. Furthermore, it is 
worth mentioning that the multiple solutions yielded by the proposed Pareto- 
optimal approach, which is generally criticised for its non-uniqueness, find 
their full usefulness in the process of ranking the individuals of the parental 
pool in each evolutionary cycle. 

For the purpose of making the final evaluation of the obtained outcomes, 
we propose to order all the Pareto-optimal solutions with the use of a global- 
optimality level/index, which gives a scalar measure of solutions relative to 
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attainable maximum values of the respective partial quality indices. The rank- 
ing procedure, resulting in the Pareto-optimal selection of solutions (individu- 
als), has been enriched with a niching mechanism, which can be implemented 
in various variants. Niching allows an effective exploration of the parameter 
space and prevents GAs from their premature convergence. Moreover, the re- 
sults of niching have a simple robustness interpretation, connected with the 
preservation of the niches of densely populated species, despite following the 
evolutionary policy of ‘uniform breeding’. 

Finally, two exemple applications of the genetic approach to the design of 
diagnostic systems, based on detection state observers, concerning the control 
systems for a remotely piloted aircraft and a ship propulsion system have been 
presented. The obtained simulation results have confirmed the efficiency of 
the proposed approach to the residue generator design. 
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Chapter 14 



PATTERN RECOGNITION APPROACH TO 
FAULT DIAGNOSTICS^ 

Andrzej MARCINIAK*. Jozef KORBICZ* 



14.1. Introduction 

Any parametric description of a diagnosed object includes only a small part 
of all existing state parameters; in fact, it is only a simplified model of reality. 
The best description is that which is in equivalent relation to the given states 
of the object. It means that the value x{p) of a given physical quantity from 
the set X occurs if and only if the object is in the state m. In the case of 
real objects, it is often impossible to establish unambiguously these relations 
because the physical processes considered are not known adequately in their 
analytical form, or the parameter calculation is too complex computationally. 
It follows that fault diagnosis demands determining the relations existing 
between the measured symptoms (changes of the observed quantity over its 
face value) and the faults (Calado et al, 2001; Frank and Koppen-Seliger, 
1997; Isermann and Balle, 1997). 

With this end in view, pattern recognition methods can be used to es- 
tablish these relations, assuming that the patterns of the object being in 
a certain condition are closer to each other than the patterns of this ob- 
ject in other conditions (despite measuring errors, the infiuence of random 
components, etc.). Applying pattern recognition methods (Duda and Hart, 
1973; Fukunaga, 1990) amounts to solving classification problems. The main 
advantage of this approach is that designing diagnostic tools can be based 
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on numerical data (obtained from observations or measurements). It is very 
important when the rules between the reasons and the corresponding results 
are not known. 

In general, a given pattern recognition algorithm can form the base of the 
diagnostic system for static objects, or it can be used as a supplement for 
other diagnostic systems applied to the diagnosis of dynamic objects. The 
generalized pattern of process (object) diagnostics is illustrated in Fig. 14.1. 



Measurements 




Symptoms/Features 
5={5i, S2,...,Sm] 




Faults 

F ={ 0,...,1 0 } 





Fig. 14.1. Generalized pattern of process diagnostics 



With respect to the enormous diversity of bibliography in the field of 
pattern recognition, we describe only the classical approaches in this chapter, 
i.e., decision-theoretic algorithms. For that reason we intentionally omit the 
structural and syntactic approaches and rules-based inference methods (e.g., 
decision trees). 

Moreover, the methods of increasing the reliability of classifiers based on 
redundancy are presented. Experimental results are provided using charac- 
teristic classification data benchmarks connected with medical diagnostics. 



14.2. Classification in diagnostics 

Pattern recognition includes the problems of detection, perception and recog- 
nition of regularities in a set of parameters describing an object or event. Gen- 
erally, the aim of pattern recognition is to assign a physical object or event to 
one of several pre-specified categories (Duda and Hart, 1973). The recognition 
of an object as a unique singleton class is called identification^ whereas the 
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process of grouping objects together into classes (subpopulations) according 
to their similarities in the feature space is called classification (Looney, 1997). 

A pattern can be a picture, character, fingerprint, speech signal, ECG or 
a vector of measurements obtained from the sensors of the diagnosed object. 
In real applications, recognition is performed in the conditions of a lack of 
a priori information about the principles of assigning the objects to the 
classes. The only useful information is included in the learning set, composed 
of objects for whom the correct classification is known. Therefore, pattern 
recognition should consist in learning from examples so as to achieve the 
best generalization ability. It means that learning should be performed in 
such a way so as to ensure the best recognition of the vectors from outside 
the learning set. Let us notice that recognition is a form of reasoning, while 
classification is a form of learning. 

One can perceive the recognition process as a sequential one, consist- 
ing of the following three stages: measurement and pre-processing, feature 
extraction, and recognition (identification). The attributes of an object are 
sensed, observed and measured to yield a pattern vector in the first stage. 
Since real objects are usually described by numerous attributes, the aim of 
feature extraction is to reduce the number of attributes/features by selection 
or /and transformation. The number of features can be crucial for the compu- 
tational complexity of the recognition process and, on the other hand, many 
attributes can be strongly correlated with the others. As the complexity of 
the classifier and the complexity of its hardware implementation grow rapidly 
with an increase in the number of dimensions of the feature space, it is im- 
portant to base decisions only on the most essential information in the sense 
of discrimination ability. It is very difficult to find optimal solutions for the 
feature selection problem; thus one may say that selection often introduces 
an arbitrary element into the recognition process. The recognizer receives a 
feature vector as an input, and operates on it to produce an output that is a 
unique identifier (a number, codeword, vector, etc.) associated with the class 
to which the object belongs. 

Since an overview of feature selection and extraction method is presented 
in (Kittler, 1986), we describe in the next section only several techniques that 
are very important in medical and industrial diagnostics. 



14.3. Symptom extraction with time-series analysis 

In the case of many FDI systems designed for industrial processes, the only 
information about the system state is available from measurements generated 
by the sensors. The measured signals can be considered as time series and 
therefore adequate methods of feature extraction should be applied. 

The methods of feature extraction from time series such as cross- 
correlation functions, power spectral analysis, Kalman filtering, autoregres- 
sive moving average modelling, etc. not only provide features, but can also 
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remove superfluous information, i.e., they Alter in the data and amplify the 
contrast in data. Many of these methods are effectively applied to biological 
signal processing (e.g., EEG or EGG) (Shiavi and Bourne, 1986). However, 
many of them are used on the assumption that the system considered is linear 
or can be linearized in several working points (Alippi and Piuri, 1996). 

Let n denote a discrete time where the real time t is nT and T is the 
sampling interval. Let Y{m) and Y{f) denote respectively the frequency 
and power spectra, where m is the frequency number and / is the real 
frequency. 

A valuable technique of waveform detection in a single channel is cross- 
correlation given by 



N 

y{n) = E s{m) * x{m — n), (14.1) 

where N is the number of points in the template, and denotes the convo- 
lution operator. A wavelet of the form s{n) exists when y{n) is more than 
a given threshold. The templates are selected from population averages. 

A classical approach to Power Spectral Density (PSD) estimation is the 
periodogram: 



5(m) — X{m) — x{k) exp(-j27rmfc/A'), (14.2) 



where X(-) and X*(-) are conjugated discrete Fourier transforms of the 
measured signal x{k). Applying time windows can result in a reduction of 
the unwanted leakage effect and estimation variance. PSD is often used for 
signal modelling, feature extraction and Alter specification. An example of 
Alter speciAcation application is the Wiener Altering. 

The frequency-domain structure of a Wiener Alter is 



H{f) = 



S{f) 

S{f) + N{f)’ 



(14.3) 



where 5 and N are PSD of the signal and noise, respectively. The Wiener 
Altering is very useful when the signal-to-noise ratio is small and the signal 
waveform is undistinguishable. Certainly, the Wiener Alter is valid for station- 
ary stochastic signals. Another approach to signal estimation is the Kalman 
Altering, described in other parts of this book. However, this requires some 
knowledge of the signal model. 

The concept of developing signal models that correspond to different sit- 
uations (symptoms) is used in the AutoRegressive (AR) modelling approach. 
Many models are developed and the decision which of them is best is made 
based on the ongoing signal epoch. This method is commonly used to detect 
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different symptoms from bioelectric measurements (e.g., EEG, EGG) (Shiavi 
and Bourne 1986). The AR model is described by 

N 

x{k) = Y^ai x{k-i)+e{k), (14.4) 

i=l 

where a^’s are the model coefficients, N is the model order and e{k) is a 
zero-mean white-noise process. The ai coefficients are derived by minimizing 
(e.g., with the least-squares method) the prediction error given by 

N 

e{k) — x{k) — ^ aix{k — i). (14.5) 

i=l 

The minimization of (14.5) consist in finding the solution of Yule- Walker 
equations. AR model coefficients are useful features in pattern classification 
in both on-line and off-line approaches. For the off-line case, a signal is divided 
into epochs of equal length and coefficients are computed for each epoch. In 
the on-line case, as the signal is measured, the prediction error of all the 
models is computed simultaneously for a finite time. 

A better model of real data is provided by the AutoRegressive Moving 
Average (ARMA) model: 

N M 

X{k) = V aix{k - i) + ^ biu{k - i) + e{k), (14.6) 

i=l 2=0 

where u{k) is different time series. Generally, ARMA requires a lower-order 
model to fit the data but is more computation-consuming. Another instance 
of a linear model is the AutoRegressive with eXogenuous input (ARX) model, 
which is described in further parts of this chapter. The main disadvantage 
of the AR, ARMA and ARX models is their linearity. Therefore, in many 
real applications a non-linear neural network is used instead of linear models. 
Since both neural network prediction and neural network estimation schemes 
are described in other chapters, they will not be elaborated on here. 



14.4. Pattern recognition methods 

Suppose that there are r reference patterns {sy.e/,fre/} in the learning set 
[/, where s is an element of the multidimensional discrete symptom space 
S e R^. Furthermore, let there be q faults (or state vectors) f^, being the 
elements of the fault (state) space F, where F G {0,1}^. Thus the fault 
vectors are binary encoded in such a way that 1 on a given position means 
the occurrence of the corresponding fault on this position. 
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Among the classical pattern recognition methods used in fault diagnosis, 
one might distinguish the decision-theoretic approach (statistical) and ap- 
proximation methods. The decision-theoretic approach can be categorized in- 
to two groups of methods: parametric and nonparametric (minimal-distance) 
ones. 

14.4.1. Minimal-distance methods 

14.4.1.1. Measures of distance in the multidimensional 
symptom space 

In this group of methods, the relations between the detected symptoms and 
faults are strictly connected with the concept of distance in the symptom 
space 5 G This space should be equipped with a proper metric, de- 
scribed by 

p: SxS —>R. (14.7) 

A metric is a mathematical function that associates with each pair of elements 
of a set a real non-negative number with the general properties of distance 
such that the number is zero only if the two elements are identical. The 
number is the same regardless of the order in which the two elements are 
arranged, and the number associated with one pair of the elements plus that 
associated with one member of the pair and a third element is equal to or 
greater than the number associated with the other member of the pair and 
the third element. These conditions can be rewritten by 

p(Si,Sj) = p(sj,s^), 

p{si, Sj) = 0 <= Si = Sj, (14.8) 

p{Si,Sj) < p{Si,Sk) -\- p{Sk,Sj), 

for each vector G 5, for i = {1,2,...}. It is easy to observe that there is 
an infinity of mappings satisfying foregoing conditions. Hence the problem 
of the metric selection can be solved only in an empirical way. At the same 
time, one should emphasize that the metric selection has a crucial infiuence 
on results in real applications. 

Metrics can appear as similarity measure of two different vectors but, in 
fact, many of them are rather dissimilarity measures, since a greater value 
means greater distance and dissimilarity (Looney, 1997). 

The most commonly used metric is the Euclidean distance, defined by 



PE{Si,Sj) - 



N 



\ n=l 



(14.9) 



The Euclidean distance is useless in case of significant differences between 
the scopes of the variables from the symptom vector. A higher-order scope 
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of any variable can dominate the influence of other variables and have de- 
cisive impact on the distance value. Therefore, in practice, the generalized 
Euclidean distance is often used, which is 



pEu{Si,Sj) = 

where the multipliers An can be calculated with 



N 



/U ^^[An(^i,n 



An = I max Sn — min s. 
seu seu 



-1 



(14.10) 



(14.11) 



Since squaring and rooting a number require many calculations, other useful 
metrics can be used. The Manhattan metric (city block norm) is given by 

N 

PCB{Si,Sj) = \Si,n - Sj,n|- (14.12) 

n=l 



Another simple metric is the Tchebyshev norm (a supremum or maximum 
norm), derived via 

Pc{Si,Sj) = max|5i,n - Sj,n|- (14.13) 

n 

The above metrics are specific instances of the Minkowski norm, defined as 



PM(Si,Sj) = 




1 

t 




(14.14) 



By changing t one can fit the metric to the specificity of the problem con- 
sidered. However, a significant disadvantage of the Minkowski norm is the 
assumption that the base of the symptom (feature) space is orthogonal. It is 
easy to observe that in most cases the elements of the symptom vector s are 
correlated which each other and then the Mahalanobis metric seems a proper 
solution, which is defined by 



PMh(Si,Sj) = Y^(Si - Sj)TC-l(Si 



(14.15) 



where C is the feature covariance matrix and can be derived from 



Ca,6 



1 

K-1 



K 

^ P'b) ^ 

k=l 



(14.16) 



where a and h are feature indices, and p is the average value of a given 
feature within some subpopulation of K elements, i.e.. 
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On the condition that a proper transformation matrix is provided, the 
Mahalanobis metric can be an instance of the square norm that takes in- 
to consideration the different properties of the feature space. For example, 
when the covariance matrix C is homoscedastic (a covariance matrix with 
identical elements on the main diagonal and zeroed elements outside of it), 
we obtain the generalized Euclidean distance. Using a transformation ma- 
trix which has non-zero elements only outside the main diagonal can lead to 
the Bhattacharyya metric, which is often pointed to as the optimal one for 
classification and clustering problems. 

In order to visualize distance measures we use the concept of a hyper- 
sphere, which is a set of the form S = {s : p(s, z) == r} centered on some 
vector z with radius r in AT-dimensional space. Figure 14.2 displays hyper- 
spheres in the plane for the metrics mentioned above. 

Another useful distance measure is the Camberra distance, defined by 



N 

n=l 



^i,n ^j,n 

4 " ^j,n 



(14.18) 




Fig. 14.2. Metrics and their hyperspheres in the plane (a) 
Euclidean metric, (b) Tchebyshev metric, (c) Mahalanobis 
metric, (d) Manhattan metric 
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and the Hamming distance, which is a modification of the Manhattan metric 
where the elements of the feature vector are binary values. Besides, it is 
necessary to mention the likeness (similarity) measures 7, which vary in 
inverse proportion to the dissimilarity measures p, i.e., 

7(s»,Sj) = I . ^ for pisi,Sj)y^Q. (14.19) 

Using the likeness measure instead of the dissimilarity one should be con- 
nected with a change of the decision rule in the classification algorithm (e.g., 
min — max). Some commonly used likeness measures are direct cosine, de- 
fined as 

where ||s|| is a norm of vector s, and Tanimoto’s function, derived via 



7(si,Sj) 



S- Sj 

sjsi + sjsj - sjsj ' 



(14.21) 



Unfortunately, there are no general rules for picking out the optimal metric 
in a given problem. Although the metric is usually selected arbitrarily or 
experimentally, there are some rules that can be helpful. For example, analysis 
in the time domain with numerous elements of the feature vector should not 
be performed with the Mahalanobis metric. Furthermore, when the variables 
from the feature vector have different scopes, the Minkowski metric for ^ > 2 
is not recommended, just like in the case of strongly correlated features. 



14.4.1.2. Minimal-distance methods 

The most commonly known minimal-distance algorithm is based on the Near- 
est Neighbor (NN) rule and is named likewise. Its concept is presented in 
Fig. 14.3, where different symbols indicate patterns belonging to various 
classes. Training the NN classifier consists in preparing the set of reference 
patterns. Given a new symptom vector, the recognition task is to assign it to 
one of the classes according to its minimal distance to one of the reference 
patterns, i.e.. 



seFi when 3 p{s,Si^k) = min p(s,S4fc), (14.22) 

where i is the class (fault/state) index. A significant disadvantage of this 
classifier is its sensitivity to misclassified reference patterns or measuring 
errors (a single outlier can cause wrong classification). Another problem is 
the possibility of ties in the distances between two or more patterns resulting 
at least from the finite computer precision. 

The k -Nearest Neighbor (A:-NN) algorithm (Fukunaga, 1990) is less sen- 
sitive to these errors. After finding the distances from an unknown symptom 
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Fig. 14.3. Concept of the Nearest Neighbor rule 



vector to its k nearest neighbors from the reference set C/, each neighbor 
votes for its own class. The class with the majority of votes wins, which we 
can write as 



s G Fi when ki — maxkj (i, j = 1, . . . , M), (14.23) 

j 

where k = Ylj kj . The parameter k is given arbitrarily, but its value should 
be much smaller than the cardinality of the subset compounded from the 
patterns of the smallest class in the reference set, i.e., 

k<minU\ (14.24) 

i 

An example of a wrong value of the parameter k is illustrated in Figure 14.4. 

The advantages of the analysed methods are their simplicity, intuitiveness, 
and comparatively good recognition rates. Therefore, these techniques are 
often used to estimate the Bayes error, i.e., the overlap between different class 
densities. Preliminary results for the A:-NN algorithm allow evaluating the 
quality of the feature space, even when a different classifier is used eventually. 
For example, when features are extracted or selected, the Bayes error in the 
feature space must be compared with the one in the original measurement 
space in order to determine whether the new set of features is acceptable. 

The disadvantages of the k -NN algorithm are that it requires much stor- 
age space, because the entire reference set is located in the memory, and the 
computations take a lot of time, because the distances between an unknown 
vector and all the reference patterns need to be calculated. 

These disadvantages can be eliminated by using prototypes (patterns, 
templates) for each class. The Nearest Mean (NM) method exemplifies the 
advantages of using the prototypes. Each class is represented by one center. 
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that is the sample mean of the patterns belonging to the analysed class and 
is calculated from 

1 

= (14.25) 

k=l 

In the case of the multimodal distribution of classes and when the repre- 
sentatives of individual classes form non-convex shapes in the feature space, 
the Nearest Mean method should not be used. Such a situation is shown in 
Fig. 14.5(b), where the class centers are represented by grey-filled symbols. 




^1 



Fig. 14.4. A;-NN algorithm with a wrong value of the parameter k 




(a) (b) 



Fig. 14.5. Concept of the Nearest Mean algorithm: (a) correct 
and (b) incorrect usage 
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It is easy to notice that more than one center should to be used as a 
class representative. Clustering methods can be used in order to generate 
these class representatives. The most commonly used method is the k-means 
algorithm, which requires the number k of clusters to be given and consists 
of the following steps: 

1. Assign each sample vector to one of k randomly selected initial centers. 
Set the iteration counter t — 1. 

2. Compute a new average as a new center for each cluster using (14.25), 
where i is the cluster index. 

3. Assign each sample vector to the cluster with the nearest center. In- 
crease the iteration counter t = t -{-1. 

4. Check the stop criterion. If any center has changed in the last iteration, 
then go to step 2, otherwise terminate. 

After generating the cluster centers, the recognition process can be contin- 
ued using the NM method, i.e., by applying the nearest neighbor assignment 
principle and respecting the higher number of representatives for each class. 
Such an application of the k-means algorithm (for A; = 2) is illustrated in 
Fig. 14.6. 




Fig. 14.6. Generating class representatives using the k-means algorithm 



14.4.2. Statistical methods 

The concomitant phenomena of state changes of diagnosed usually have a 
statistical nature, which means that processes with specific parameter dis- 
tributions can be assigned to the same state Fi. Thus, statistical methods 
can be applied providing that the conditional probabilities for the symptom 
vector, given that it belongs to each state Fi, have the normal distribution. 

Discriminant function analysis is used to classify cases into the values of 
a categorical dependent, usually a dichotomy. Fisher (Fukunaga, 1990) was 
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the first to propose a procedure for a two-group case based on maximizing 
the separation between the groups like in the case of the analysis of variance. 
Fisher’s linear discriminant function is given by 

L — (wi - W2)"^C~^s - ^(wi - W2)^C“^(wi -1- W 2 ), (14.26) 

where wi and W2 are the vector means for the corresponding classes and 
C denotes the pooled sample covariance matrix. The symptom vector s is 
recognized as belonging to the first class if L > 0, and as belonging to the 
second class otherwise. 

When there are more than two groups, we can estimate more than one 
discriminant function like the one presented above. In the case of Fisher’s dis- 
criminant functions, it is assumed that the data (for the variables) represent 
a sample from the multivariate normal distribution, where the expected val- 
ues w = {wi, W2 , . . . , wm} can be estimated by averaging within the class. 
Moreover, it is assumed that the covariance matrices of variables are homo- 
geneous across groups, and the common covariance matrix can be estimated 
as an average value, i.e.. 



M 

^ ^ N -M ^ (14.27) 

m=l 

where denotes the covariance matrix for the class m. The linear dis- 
criminant functions for a given symptom vector s can be derived via 

Lm,l = s'^C“^(Wm - Wj) - ^(Wm - W/)'^C“^(w„ + W;). (14.28) 

The symptom vector s can be assigned to the class m on the condition 
that Lm,i > 0 for m ^ /. One can observe that — Li^rn- If the size of the 
parameter vector is not smaller than the number of classes, i.e., N > M — 1, 
then there are (M — 1) linear statistics Lm,i- In the opposite case, the linear 
space generated by the statistics Lm,i is AT-dimensional. 

Let there be three classes in the diagnostic problem. Calculating two 
statistics Li,2 and will be enough to calculate all the possible statistics 
L as L2,3 = 1/1^3 — Li, 2- The classification is based on the following rules: 

• if Li ^2 > 0 and > 0 then Fi, 

• if Li ^2 < 0 and Li,3 > Li,2 then F2, 

• if Z/1,3 < 0 and Li,2 > ^1,3 then F3. 

Discrimination analysis can be modified using the a priori probabilities 
Pm of the occurrence of class Fm . The conditional probability that s is from 
the class m is given by 

p. , ^ ^ J’m|Cn^|~^/^exp [-|(s- Wn^)'^C~^(s- W^)] 

Efcl Pi\Ci\-^/^ exp [-|(s - W()^Cf ^(s - W;)] 



(14.29) 
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The vector s is classified as belonging to the class Fj according to the 
decision rule as follows: 



seFj, if P{Fj\s) = max P(Fi |s). (14.30) 

i 



Another commonly known technique is quadratic discriminant analysis, 
which allows calculating discriminant function via: 

= w^C“^(s - +logF„. (14.31) 



If the two populations are normally distributed, the quadratic discriminant 
rule is the best discriminant rule, in the sense of the minimization of the 
expected misclassification probabilities. However, for a small number of sam- 
ples, the quadratic discriminant function can behave appreciably worse than 
linear functions. 

A well-known approach to statistical classification is Bayesian decision- 
making based on the Bayes theorem, which is given by 



pim 



pm)p{fi) 

P{s) 



(14.32) 



where P{s) denotes the probability of the occurence of the vector s, P{s\fi) 
is the conditional probability for the symptom s given that it belongs to the 
fault fi, while P{fi) denotes the a priori probability of the fault (state) fi 
(Leonhardt and Ayoubi, 1997). The left-hand side probability of (14.32) is 
the a posteriori probability of a certain fault once a specific symptom vector 
s has been observed. A standard approach of the Bayesian decision theory 
is to minimize the risk of making incorrect classification. It is done if, for a 
specific symptom s , the fault fi with the highest a posteriori probability is 
chosen. Let J be the cost function defined by 



0 for F == F, 

(correct decision) 



J{f,f) = { 



for F^F, 
(wrong decision) 



Jr for F = Fo, 

(decision refusal) 



(14.33) 



where F is the optimal estimate for Fi, and Fi denotes a lack of decision. 
The optimal decision rule is given by 

F(F|s) = max{F(F|s)}, 

F(F|s) > , 



find F such that 



(14.34) 
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If these conditions are not satisfied, i.e., F = Fq, then there is a lack of 
decision. The decision rule actually selects the fault class with the highest 
probability, if and only if the probability for that specific class is higher 
than the relative risk of a wrong decision. Upon applying (14.32)-(14.34), we 
obtain 

P{s\F)P{F) = m^Xi{P{s\Fi)P{Fi)}, 

(14.35) 

P{s\F)P{F) > 

Based on the assumption that the conditional probability has the normal dis- 
tribution with the mean value fi and the covariance matrix C, and assuming 
further that the probability for all faults is P{Fi) = Pp, a. decision boundary 
line can be computed by 



Lb{s) 



Pf 

, . ' . .. exp 

v/(27t)'" det(K) 




(14.36) 



An advantage of Bayesian methods is that the probability of an error 
can be estimated for each decision. Their main disadvantage is that the a 
priori distributions P{Fi) as well as the conditional probabilities P(s|F) 
must be known, estimated or assumed, which can be a difficult task when 
inappropriate distributions are used (Looney, 1997). 



14.4.3. Approximation approach 



The approximation approach consists in determining the mapping function 
between the symptom space and the fault space, i.e., 5 -> P. The member- 
ship function Fi{s) is calculated for each fault i. 

The function Fi{s) can be expanded into series with respect to the es- 
tablished family function (p in the following way: 



Vi 



Vs€5 




(14.37) 



where the expansion coefficients w determine the specific function Pi(s), and 
the established bases form the family 



^ = {<Po(s)</5i(s),¥32(s),...}. (14.38) 

If the functions Pi(s) are “well expanding” with respect to a given family 
then the weights for the further components u > m can be omitted for 
< ^5 which leads to 



m 

Fi{^) = (14.39) 

U=0 

where and ipm are basis functions in the (m -h l)-dimensional linear 

subspace S'm+i- 
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Designing the classifier consists in selecting a proper basis function family 
and adjusting weights on the basis of the learning set. Weights are the only 
parameters that have to be memorized, because they “store” in some sense 
the knowledge discovered from the learning set. 

The most commonly used subspace Sm is a compound of monomials of 
at most the degree m: 

$ = (14.40) 

Other subspaces may also be used, such as trigonometric functions, 
Tchebyshev’s polynomials, Legendre’s polynomials and Laguerre’s polyno- 
mials. Suppose that the decision rule can be sufficiently approximated by 
polynomials of order h defined by (Leonhardt and Ayoubi, 1997): 

9fA^) ^ '1^0,0 + m,isi H + wi^hSi 

+ W2,lS2 H h W2,hS2 H (14.41) 

+ + * • • + Wn,hSn 



Wn-{-i,iSiS2 H , i = 

Let the vector x(s) comprise all polynomials and a linear combination of the 
symptom components. Upon considering all q fault classes, we have 

gp Wx(s). (14.42) 

If all possible linear combinations of polynomials are considered, then 

and dim(W) qr. The classifier training process consists in finding an 
optimal estimate for the matrix W. The sum squared error can be used as 
a criterion of optimality, i.e., 

r 

Q(W) = - Wx(s,e/,i))' ^ min . (14.44) 

i=l 

The solution of such a problem may always be found, which results from the 
Weierstrass theorem (Krantz, 1999), i.e., the problem amounts to solving a 
system of linear equations. 

The polynomial degree should be high enough to approximate the func- 
tion adequately, and low enough to smooth out oscillations resulting from 
measuring errors. In practice, the polynomial degree is obtained experimen- 
tally by using in each iteration a bigger and bigger degree of the polynomial 
as long as a satisfactory approximation error is achieved. The essential nu- 
merical feature of this method is that when m > 6, the system becomes ill 



r = dim(x(s)) = 
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conditioned. One can assume that there are more and more calculations and 
the results are uncertain when the degree of polynomial is rising. A solu- 
tion to this problem may be applying orthogonal polynomials, e.g., Gram’s 
polynomials, Legendre’s polynomials, etc. A different approach to solving this 
problem is applied when using machine learning methods, especially artificial 
neural networks (Rojas, 1996; Looney, 1997). 

The main feature of Artificial Neural Networks (ANNs) is their ability to 
learn and generalize the collected knowledge on the basis of examples from 
the learning data set. Therefore, ANNs can be used in many practical appli- 
cations, including diagnostics. ANNs exemplify the universal approximation 
system that can find mappings between multidimensional data sets without 
the necessity of knowing the mathematical models and causal connections. 
Designing ANNs consists in structure selection and adjusting the weights 
during the learning process (Rojas, 1996; Looney, 1997). 

It is estimated that over 80% applications of ANNs are performed using 
the Multi-Layer Perceptron (MLP). This type of structure is also most com- 
monly used in pattern recognition applications. Since ANNs are presented 
in detail in another chapter of this book, we outline here the assumptions 
concerning their application to pattern recognition. 

Suppose we have a classifier network with M output neurons, where there 
is one output line for each fault class i = 1, . . . , M. Each output F{ is 
trained to produce 1 when the input belongs to class z, and 0 otherwise. 
Since the expected total error is the sum of the errors of each output, we 
can minimize individual errors independently. It follows that such a classifier 
trained to classify an n-dimensional input x in one out of M classes can 
actually learn to compute the a posteriori probabilities that the input x 
belongs to each class. The proposition of and the proof for this property of 
the neural classifier was introduced by Rojas (1996). 

Proposition 14.1. A classifier neural network perfectly trained and with 
enough plasticity can learn the a posteriori probability of an empirical data 
set (Rojas, 1996). 

The estimated probabilities could be denoted as follows: 

P[xeCi\x), z = l,...,M, (14.45) 

where 'x. £ Ci means that x belongs to the class Ci and i denotes the class 
label (Xu et a/., 1992). In practice, for each x any classifier only estimates 
a set of approximations of true probability values. These approximations 
depend on how is trained. For any e^, a decision is made as follows: 

Cn(^) = j with Pn{x G Cj\x) — umxPn{x G Ci\x) . (14.46) 
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14.5. Developing reliable classifiers through redundancy 



14.5.1. Concept of software redundancy 

Combining estimators to improve performance has quite a long history, and 
can be found in a number of fields such as econometrics, machine learning and 
software engineering (Dietterich, 2000; Littlewood and Miller, 1989; Sharkey, 
1999). These approaches are related to attempts at elaborating fault tolerant 
computing. The idea of reliability through redundancy originates in inves- 
tigations of A^- version programming (Filippi et a/., 1994), where the main 
aim is to produce program versions which show a minimum number of coin- 
cident failures. In the area of pattern recognition, the concept of redundancy 
has been proposed for the development of highly reliable character recog- 
nition systems (Xu et al, 1992) and is currently being developed for other 
pattern recognition applications in medical diagnostics and the diagnostics 
of industrial processes (Marciniak, 2000). These techniques have been refined 
for high reliability applications where system failures are life threatening, i.e., 
in avionics, and for safe control of nuclear power plants (Martin, 1983). Cur- 
rently, they are used successfully in real systems such as the Airbus Industry 
A3 10 aircraft (a slat and fiap control system) and the Swedish State Railways 
(signal control and traffic control in the Gothenburg area) (Sharkey, 1999). 

The main reason for expecting individual classifiers to make misclassifi- 
cation errors is the fact that they are usually trained based on a limited data 
set and required estimating the target function. A combination of these im- 
perfect classifiers can overcome the limitations of the individual ones on the 
condition that they are error independent, i.e., they can generalize knowledge 
in different ways. In most instances, the combination of multiple classifiers 
is a matter of combining their decisions in parallel or in sequence, and there 
are numerous methods for that purpose. In this chapter we consider only the 
parallel combination of classifiers, called the ensemble. 

When considering the legitimacy of this approach, one can rely on us- 
ing the elements of estimation theory, i.e., the concepts of the bias and the 
variance (Sharkey, 1999). A classifier is trained to construct a function f{x), 
based on a training set {(a^i, yi), . . . , {xn^Vn)} for the purpose of approxi- 
mating y for previously unseen observations of x. Let f{x,D) indicate the 
dependence of the estimator / on the training data D. Then the Mean 
Squared Error (MSE) of / as a predictor of y is described by 



MSE = 



ED[{f{x,D)-E[y\x]f], 



(14.47) 



where Ed denotes the expectation operator with respect to D (the average 
of the set of possible training sets), and E[y\x] is the target function. MSE 
can be decomposed into two elements: 



MSE = {ED[f{x,D)] - E[y\x])^ + Ed [/(a;,£>) - Ep [(/( x, D)])'] , (14.48) 
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where the first element is the bias and the second is the variance of the 
predictor. Both of them can be estimated when the predictor is trained on 
different sets of data. The bias is some measure of the predictor’s ability 
to generalize correctly to a test set. The variance can be characterized as a 
measure of the extent to which the output of a classifier is sensitive to the 
data on which it was trained, i.e., the extent to which the same results would 
have been obtained if a different set of training data had been used. 

The best generalization ability of the predictor can be achieved when both 
the bias and variance are minimal, which results from the fact that there is 
a trade-off between them. During the learning process there is a conflict 
between minimizing the bias by fitting the data too closely and minimizing 
the variance by taking no notice of the data. The bias and the variance 
may be calculated as an average over the number of possible training sets. 
Krogh and Vedelsby (1995) determined the definitions of the bias and the 
variance in an ensemble. They express the relation between them in terms of 
the ensemble average, instead of averaging over the training sets. It follows 
easily that in their approach, the bias is a measure of the extension to which 
the ensemble output averaged over all the ensemble members differs from 
the target function, whilst the variance measures the extent to which the 
ensemble members disagree. 

An effective approach is to select a set of classifiers that exhibits a high 
variance but a low bias, since the variance component can be lowered (re- 
moved) by combining the classifiers’ responses. On the other hand, combining 
classifiers that exhibit a high bias and a small variance has no sense because 
the influence of combination methods on lowering the bias component is 
rather slight. 

In the deliberations we have made so far, there was no practical expla- 
nation of the reasons why to use the ensemble of classifiers - whether it is 
possible in practice to construct good ones. Assume we have a small poorly 
representative training set containing noisy observations. With these data, 
the learning algorithm can find many different hypotheses h in the space 
of hypotheses H. Each of them can give the same accuracy on the training 
data, but their combination can reduce the risk of choosing a wrong classifier, 
as shown in Fig. 14.7(a) (Dietterich, 2000). 

The outer curve in Fig. 14.7(a) denotes the scope of the hypothesis 
space H whilst the inner one bounds the set of hypotheses that give good 
accuracy on the training data. By averaging the accurate hypotheses, we can 
find a good approximation to the true hypothesis, which is denoted by /. 

Even if we have a good quality learning set, it can still be a numeri- 
cal problem to find the best hypothesis. For example, the optimal training 
of both neural networks and decisions trees is NP-hard (Blum and Rivest, 
1988). Therefore, when training algorithms based on local optimization meth- 
ods are employed, there is a high probability of getting stuck in the local 
optima. Combining the different hypotheses can overcome the limitation of 
optimization methods, as shown in Fig. 14.7(b). 
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(a) (b) (c) 

Fig. 14.7. Legitimacy of using an ensemble of classifiers: (a) statistical 
approach, (b) numerical approach (c) representational approach 



In many machine learning applications, the true function / cannot be 
represented by any of the hypotheses in H. Using the ensemble can expand 
the space of possible functions, as shown in Fig. 14.7(c). 



14.5.2. Diversification of classifiers 

It is obvious that there is no advantage in the generalization ability when 
we combine strongly correlated classifiers. On the other hand, the designed 
diagnostic system has to be distinguished by its reliability obtained from 
redundancy. It is very difficult to achieve an optimal balance between these 
properties. 

It is required that the classifiers be independent, which in practice means 
that they should be error independent, i.e., they should make different er- 
rors on the training set. Unfortunately, Knight and Leveson (1986) proved 
that there is no total independency of the software version. They proved that 
people tend to make the same mistakes when solving a difficult intellectu- 
al problem, which results in the fact that even when programs are written 
by different people they still behave similarly. Eckhardt and Lee (1985) pre- 
sented results confirming the fact that even when the true independence of 
software versions is achieved, they will still exhibit the dependent behavior. 
Therefore Littlewood and Miller (1989) introduced the idea of considering 
diversity instead of independence. They assume that achieving a low or neg- 
ative correlation between the failures of many programs is more desirable than 
achieving the total independence of failures. The diversification of classifiers 
can be obtained by: 

• varying the set of initial parameters, e.g., the weights in neural networks, 

• varying the type or topology, e.g., A:-NN and neural networks, 

• varying data by using disjoint training sets, adaptive resampling, boot- 
strapping, different data sources and different preprocessing methods. 
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• using special learning algorithms that can diversify the classifiers dur- 
ing the learning process. The results achieved by some of the classifiers are 
dependent on the results of others. 

It is obvious that different diversification methods can be applied simul- 
taneously, e.g., bootstrapping with different types of classifiers. However, the 
diversification of classifiers is only a necessary condition for achieving a good 
quality ensemble. 

14.5.3. Levels in the output information of classifiers 

Let P be a pattern space consisting of mutually exclusive sets P = Ci U 
• • • U Cm, with each Ci, V i G A = {1, 2, . . . , M} representing a class. 

The task of the classifier e is to assign one index jGAu{M-|-l} as a 
label to show that a given sample x belongs to the class Cj if j 7^ M -h 1. 
If j = M -h 1, then we can consider it as a refusal of recognition. Hence, the 
task of the classifier can be denoted by e{x) = j. 

Although the label j is a desirable result of recognition, many classifiers 
are able to deliver some extra information. An example of such a classifier can 
be the Bayes one, which may also provide conditional a posteriori probabili- 
ties for each of the M classes. The label j is then a result of the maximum 
selection from the M values. 

Generally, output information can be found on three levels (Xu et al, 1992): 

• abstraction - the classifier e outputs a unique label j, or a subset of 
labels J C A, 

• rank - the classifier e ranks all labels in A (or a subset J C A) in a 
queue with ascending order or descending one, 

• measurement - the classifier e attributes each label in A a measure- 
ment value to address the degree to which x belongs to the class indicated 
by the label. 

The measurement level contains the greatest amount of information and the 
abstract level contains the smallest amount. The transition from the mea- 
surement level to any other is connected with reducing the information. 

Many classifiers are able to provide output information from the measure- 
ment level, e.g., the Bayes classifier provides the a posteriori probabilities 
similarly to the neural networks and minimal-distance methods. However, 
some classifiers are unable to provide information from a level different than 
the abstract one, e.g., some syntactic classifiers and the Hopfield network, 
which outputs the pattern of the attractor instead of its label. 

Suppose we have K classifiers. By the above, combining the classifier 
outputs can take place in one out of three levels: 

• Level 1. Each classifier assigns x to the label jk, i.e., it produces an 
event {x) = jk . The problem is using these events to design an integrated 
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classifier E, which should generate the final label j, i.e., E{x) = j, j G 
AU{M + 1}. 

• Level 2. For an input x, each classifier gives a subset Lk < A with 
the labels ranked in a queue. The problem is using these events e{x) = Lk, 
k = 1, . . . ,K to design an integrated classifier E, i.e., E{x) = j, j G A U 
{M + 1}. 

• Level 3. For an input x^ each classifier Ck gives a real vector of mea- 
sures me{k) — [m^(l), . . . , mjfc(M)], where rrik{i) denotes the value of the 
membership of the sample x to the class z, which is given by the classifier k. 

One may consider an extra level of information at the output of the clas- 
sifier, i.e., information about the correctness of a given output. Its value is 
equal to 1 when the sample x is classified correctly, and to 0 otherwise. This 
level of information is used to analyze correlations between classifiers in an 
ensemble. 

It is not possible to discuss in detail all techniques that are used to combine 
the output of multiple classifiers, because their number is still increasing. The 
most commonly known methods are: 

• the abstraction level: 

- voting methods (majority and plurality rules), 

- belief integration, 

- the Behavior Knowledge Space (BKS) method, 

• the rank level: 

- class set reordering methods, 

- class set reduction methods, 

• the measurement level: 

- the averaged Bayes classifier, 

- the weighted Bayes classifier, 

- the fuzzy integral method. 



14.6. Evaluation of classifiers’ accuracy 

14.6.1. Introduction 

A classifier is usually defined by its discriminant function /, which divides 
the d-dimensional feature (symptom) space into as many areas as there are 
classes (faults). If there exist c classes Ui, such that 1 < i < c, then the 
discriminant function can also be expressed by the c functions of the classes 
/j, where fi{u) — 1 when f{u) = a;^, and fi{u) = 0 otherwise. 

Using the confusion matrix C(/) is a familiar method of illustrating 
the performance of the classifier. Each element of this matrix provides a 
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probability for the patterns of classes indicated by the row indices to be 
attributed to classes indicated by the column indices: 

Cij — 100 J p{u\uJi)fj{u)du, ( 14 . 49 ) 

where p{u\oJi) is the probability density function of the patterns belonging 
to the class oji. It is easily seen that the elements on the main diagonal of 
the matrix indicate the probabilities of a correct classification into individual 
classes whilst the other elements concern the misclassification probabilities. 

The accuracy of the classifier can be expressed by the averaged classifica- 
tion error given by 

= ’ ( 14 - 50 ) 

where Pi is an a priori probability of the class cj^. 

When the problem statistics are unknown, they must be established in an 
empirical way. Then the so-called “apparent” confusion matrix C(/) can only 
be estimated over a test set. The individual elements of C(/) are counted 
from 

N- 

inn 

= ( 14 . 51 ) 

^ k^l 

where x{k)^ 1 < k < Ni are patterns from the test set that belongs to the 
class (jJi. The estimate of the averaged classification error can be calculated 
in the same way, using the values of the probabilities Pi estimated from the 
test set by calculating 

N- 

Pi = ( 14 . 52 ) 

The number of samples in the test set is always finite. Therefore, it is 
very important to utilize to the fullest the available samples in order to have 
sufficient confidence in performance estimation. Error counting methods can 
estimate the generalization ability of the classifier. The most commonly used 
methods of error counting are presented below. 

14.6.2. Resubstitution method 

The performance of the classifier is estimated by testing it on the same data 
it has been trained with. It is easy to notice that the performance estimates 
are too optimistic. 

14.6.3. Holdout method 

The holdout method is one of cross-validation techniques, which preclude any 
sample from belonging simultaneously to both the testing set and the learning 
set (these sets are mutually exclusive). It consists in splitting randomly the 
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set of available patterns into two sets with the same number of elements (or 
nearly the same). In order to make the results more confident, the procedure 
should be repeated several times and then the average of the results should be 
calculated. At the same time, the mean value of the error may be computed 
with its standard deviation. For the comparison of different classification 
methods, the standard deviation can be considered as a measure of classifier 
robustness. If the standard deviation values for the various testing sets differ 
considerably, one can say that the classifier is not robust to a change of the 
learning set. 

To achieve an accurate distribution of classes in both the learning set 
and the testing set, a stratified holdout method can be used. The elements of 
either set are drawn with respect to the condition that the number of patterns 
of the same class in both sets should be equal (or nearly equal). However, the 
holdout method has a tendency to overestimate the actual misclassification 
rate in comparison with the resubstitution method. 



14.6.4. Leave>one>out method 

Let N be the number of all available samples in the data set. This method 
consists in leaving one single sample for testing, and using the remaining 
{N — 1) samples to train the classifier. The procedure is repeated N times 
(for every sample as a testing one) and the results are averaged. 

The leave-one-out method does give an upper bound of the error proba- 
bilities, but the estimate is more accurate than the one given by the holdout 
method (Jutten, 1995). Since the resubstitution method gives a lower bound 
of these probabilities and the leave-one-out method the upper bound, the real 
performance lies in between. 

14.6.5. Leave-/f-out method 

In this approach, the data are divided m — N/k times by leaving k samples 
for testing, and using the others to train the classifier. The results of m 
trials are averaged. Applying A: == 1 we can see that the method is converted 
into the leave-one-out method. When k == N/2, it converges to the holdout 
method. Leave- /c-out is a compromise between these methods and usually 
gives a better estimation of classifier performance (Jutten, 1995). 



14.6.6. Bootstrapping methods 

The basic idea of these methods is to repeat the classification experiment 
many times and to calculate the statistics by averaging (Efron, 1979). Boot- 
strapping methods differ from cross-validation ones in the way of drawing. 
There is no condition that the testing and learning sets be mutually exclusive, 
so each sample can occur in both of them. 
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As an example of a bootstrapping method we present the Adaptive Re- 
sampling and Combining (ARC) method (Sharkey, 1999). Generally, this 
method consists in sample selection with probability proportional to the 
achieved misclassification rates. Let P denote the data set, {P(n)} be the 
set of probabilities that are defined for each sample in the data set, and i 
be the iteration counter. The probabilities are initially equal in value, i.e., 
Pi{n) — l/N for each sample. The algorithm contains the following stages: 

1. Use the current probability values {Pi{n)} to select the training subset 
Pi, and then train a classifier with this subset. 

2. Test the classifier using the whole set P. For all samples, adjust 
the misclassification rate d{n) to 1 when it was classified correctly, and to 0 
otherwise. 

3. Calculate the auxiliary parameter e from 

Si = ^d(n)P(n), (14.53) 

n 



and then update the probabilities by 



P{n)i+i = 



p(„)(|i)d(n) 

Y,P{n)^Yin) 



(14.54) 



The stop criterion for this algorithm is the number of iterations, which is 
established arbitrarily. 

14.6.7. Comparison of classifiers’ performance and confidence intervals 

Many papers give comparative experiment results on the basis of average 
classification errors counted with the holdout method (Jutten, 1995; Xu et 
al.^ 1992). It is worth pointing out that such an approach is not reliable from 
the statistical viewpoint. It results from the fact that there is no information 
whether different classifiers used similar testing sets. In other words, nothing 
ensures that the same result would be obtained for a different partition of 
the data set with the holdout method. Therefore, confidence intervals are 
calculated to compare the experiments. 

Assume we have a classifier trained with a given learning set. Let the 
random variable Ai(x) be equal to 1 when the classifier recognizes sample 
X as belonging to class uji and to 0 otherwise. Let pi be the probability for 
Ai{x) to take the value 1. Consequently, the probability for Ai{x) to take 
the value 0 will be equal to (1 — p^). Suppose there are N samples to classify. 
If the probability to assign the sample to the class i is obtained by counting 
Ti = '^Ai, then Ti has the binomial distribution {Ti^N). Let B be the 
success rate of the class i given by B = Ti/N. In the case of a large data 
set (N > 30), the probability density function of B can be approximated 
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by the normal distribution of the mean pi and of the standard deviation 
V{Pi{l-Pi))/N. 

The probability value pi can be estimated via 

Pi = (14.55) 

where Ni is the number of samples recognized as belonging to class cui. 
Applying (14.55) we can approximate the probability density function of B 
b y the normal distri bution of the mean Ni/N and the standard deviation 
y/{Ni{N — Ni))/N^. We thus get the 95% confidence interval for the value 
of B from 






Consequently, the 95% confidence interval for a random variable Cik of 
the apparent confusion matrix can be calculated via 



Cik - 1.96 J - < Cik < Cik + (14.57) 



Ni 



Ni 



where Cik is an element of the matrix, given in percent, and Ni is the 
number of samples belonging to the class w;. Furthermore, the confidence 
intervals on the apparent averaged classification error E can be computed 
by 



1.96 






(14.58) 



The confidence interval is usually computed when the following tests are 
applied: 

• the resubstitution method, 

• the holdout method without averaging, i.e., with only one experiment, 

• the leave- /n-out method with k < N. 



Obviously, the comparison of classifiers will be most accurate if the same test 
method and the same testing sets are used. Thus the leave-one-out method 
is most confident and most commonly used in comparisons, despite its com- 
putational complexity. 



14.7. Some classification problems 

In the case of classifiers established in an empirical way (using the learning 
set), there is only one way, in principle, to compare their performance, viz. by 
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testing them on many data sets (benchmarks). On the Internet one can find 
many databases (the so-called repositories) containing benchmark problems. 
Probably the most commonly known repository is the U Cl Repository of Ma- 
chine Learning Databases and Domain Theories, University of California Ir- 
vine (available at http: //wwwl . ics.uci.edu/mlecLrn/MLRepository.html), 
which contains many real classification problems concerning such disci- 
plines as finances, medicine and technology. Besides, it contains many 
artificial databases generated according to the requirements on the heavy 
intersection of class distributions and a high nonlinearity degree of class 
boundaries in the multidimensional feature space. Because of their difficulty, 
these databases are usually used for rapid tests on newly developed algo- 
rithms. Some real benchmarks and classifier results for them are presented 
below. 

14.7.1. Breast cancer diagnosis 

The Wisconsin Database Breast Cancer is one of the most popular bench- 
mark problems on the Internet. The samples have been collected by Wolberg 
from the University of Wisconsin Hospitals, Madison. They were collected 
periodically as Wolberg reported his clinical cases and can be used to dis- 
tinguish between benign and malignant breast lumps. The samples consist 
of visually assessed nuclear features of Fine Needle Aspirates (FNAs) taken 
from the patients’ breasts. This technique is minimally invasive, so it is desir- 
able in clinical treatment and therefore the problem of diagnosing with FNA 
images becomes very important. Each sample was assigned a 9-dimensional 
vector by Wolberg and contains elements extracted from the sample image, 
e.g., the clump thickness, uniformity of the cell size, marginal adhesion, etc. 
Each component is evaluated on the scale from 1 to 10, with the value 1 corre- 
sponding to the normal state and 10 to the most abnormal state. Malignancy 
is determined by taking a sample tissue from a patient’s breast and perform- 
ing a biopsy on it. A benign diagnosis is confirmed either by the biopsy or 
by a periodic examination, depending on the patient’s choice. The sample of 
input image taken from FNA is shown in Fig. 14.8. 

An exemplary comparison of classifier results is presented in Table 14.2. 
These results are partially taken from the Internet site Datasets used for 



Table 14.1. Data profile 



Description 


Number of samples 


Malignant 


241 


Benign 


458 


Total 


699 


Number of features 


9 
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Fig. 14.8. 




classification: comparison of results, available at the website of the Nicholas 
Copernicus University in Toruh, Poland. 

14.7.2. Diagnosis of erythemato-squamous diseases 

Simulation experiments have been performed with the use of the benchmark 
data from the Dermatology Database* (available at the UCI Repository of 
Machine Learning Databases and Domain Theories). The differential diagno- 
sis of erythemato-squamous diseases is a real problem in dermatology. They 
all share the clinical features of erythema and scaling with very little differ- 
ences. The patients were first evaluated clinically for 12 features (e.g., ery- 
thema, scaling, age) and then skin samples were taken for the evaluation of 
22 histopathological features. The values of the features were determined by 
analysing the samples using a microscope, and they belong to the set {0, 1, 2, 
3}. The distribution of the diseases in the database is shown in Table 14.3. 

The results obtained for the stratified holdout method are presented in 
Table 14.4. The best results were achieved by an ensemble of multi-layer 
feed- forward neural networks. 



14.7.3. Fault diagnosis in a two-tank system 

In this experiment the structure of parallelly connected artificial neural net- 
works was used to detect and isolate faults in a two-tank system, shown 
in Fig. 14.9. The classifiers were implemented using Radial Basis Functions 
(RBF) (Rojas, 1996). The underlying idea of the presented multiple network 

* Provided by H. Altay Guvenir, Bilkent University, Department of Computer Engi- 
neering and Information Science, 06533 Ankara, Turkey. 




14. Pattern recognition approach to fault diagnostics 



585 



Table 14.2. Comparison of results for the Wisconsin Database Breast Cancer 



Method 


Accuracy (%) 


Error counting 
method 


Feature Space Mapping 


98.3 


Leave one out 


3-NN (Manhattan metric) 


97.1 


Leave one out 


21-NN (Euclidean metric) 


96.9 


Leave one out 


C4.5 (decision tree) 


96.0 


Leave one out 


RIAC (rule induction) 


95.0 


Leave one out 


SVM (5xCV) 


97.2 


Leave 10 out 


/c-NN with DVDM 


97.1 


Leave 10 out 


IncNet 


97.1 


Leave 10 out 


Linear Fisher discriminant 


96.8 


Leave 10 out 


MLP with BP 


96.7 


Leave 10 out 


LVQ 


96.6 


Leave 10 out 


Semi-Naive Bayes 


96.6 


Leave 10 out 


Naive Bayes 


96.4 


Leave 10 out 


IBl 


96.3 


Leave 10 out 


RBF (Tooldiag) 


95.9 


Leave 10 out 


CART (tree) 


94.4 


Leave 10 out 


ID3 


94.3 


Leave 10 out 


QDA (quadratic discriminant analysis) 


34.5 


Leave 10 out 



Table 14.3. Distribution of classes in the Dermatology Database 



Disease (class) 


Number of instances 


Psoriasis 


111 


Seboreic dermatitis 


60 


Lichen planus 


71 


Pityriasis rosea 


48 


Pityriasis rubra pilaris 


20 


Cronic dermatitis 


48 


Total: 


358 
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Table 14.4. Comparison of results for the Dermatology Database 



Method 


Recognition 

rate 


Error rate 


Rejection 

rate 


Ensemble of feed-forward 
neural networks 


96.22 


3.78 


0.00 


1-NN with Euclidean metric 


94.60 


5.40 


0.00 


3-NN with Euclidean metric 


94.06 


5.40 


0.54 


5-NN with Euclidean metric 


94.05 


5.95 


0.00 



Pump 




Fig. 14.9. Two-tank system scheme 



scheme in an FDI system is to develop n independently trained neural classi- 
fiers for n working points. The decision which classifier should be taken into 
account is given by using a fuzzy supervisor. Its task is to assign a certain 
degree of reliability to the output of each neural network on the basis of the 
measure of the variable indicating the working point. The output value of the 
supervisor can be obtained from the formula 

N 

E yi9i{x) 

y = , (14.59) 

E 9i(x) 

i=l 

where gi is the working point membership function associated with i-th 
from N neural networks and yi is the output of the neural network. 
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The fault is considered as a physical parameter variation of the linear plant 
model. Therefore, the diagnostic process includes a parameter identification 
block responsible for symptom extraction. The ARX model is given by (Ficola 
et a/., 1997): 

+ e{t), (14.60) 

where e{t) is the error. The corresponding linear regression is 

y{t) = h'^(i)g + e{t), (14.61) 

where g = [ai . . . 6i . . . bn^]^ vector of the lumped parameters 

to be identified, and h stands for the vector of regressors h^{t) — [—y{t — 
1) . . .—y{t—na) u{t—l ) . . . u{t—rib)]^ The estimator is given by the following 
equations: 



L{t) = P(t - l)h(t)[A + - l)h(t)] 


(14.62) 


P(i) = l(P(f-l)-L(t)hT(t)P(i-l)), 


(14.63) 




(14.64) 


g{t) =g{t-l) + L{t)e{t), 


(14.65) 



where P(t) denotes the error covariance matrix and A is the forgetting 
factor. 

The water level values equal to 0.5 and 0.6 [m] in Tank 1 were engaged. 
The nominal water level value in Tank 2 was 0.1 [m]. The actuators were 
the pump and the switching valve VI controller. The following faults were 
simulated: a leakage in Tank 1, the blocked and opened valve VI, the blocked 
and closed VI, and combinations of these. Two ARX model estimators were 
used to estimate the transfer functions between the pump output and both 
water levels. Two neural networks for the two working points were trained. 
Each fault was represented by 50 samples in the reference set. The output 
pattern is defined as F=[nominal, leakage^ VI opened^ VI closed]^ where the 
items should be selected from the interval [0,1]. Exemplary diagnosis results 
in the form of fault membership values are shown in Fig. 14.10. 



14.8. Summary 

This chapter includes a brief survey of some of the commonly used meth- 
ods of pattern recognition regarding their application to the diagnostics of 
both medical and technical processes. Pattern recognition methods can be a 
supplement to model-based diagnostic systems. Their task here is to make 
a decision (fault isolation) on the basis of residuals generated by the differ- 
ences between the model and system outputs. An alternative approach is to 
use classification methods to build a direct mapping between the symptoms 
and faults without using the model. This feature-based approach seems to 
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Fig. 14.10. Estimated F values for the following states: normal work (upper-left), 
VI opened (upper-right), VI closed (middle-left), leakage in Tank 1 (middle-right), 
simultaneous leakage and blockage of closed VI (lower) 



be appropriate when the residual signals generated by models do not have 
enough information to isolate the faulty states. However, this approach ex- 
hibits low performance if the features are not well chosen or not well extract- 
ed. The feature extraction/selection problem is crucial for the performance 
of feature-based FDI systems, and certainly a lot of research work still has 
to be done in this field. The presented methods do not solve the complete- 
ness problem, which results in real applications where unexpected symptom 
combinations may still cause problems. 
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Chapter 15 



EXPERT SYSTEMS 
IN TECHNICAL DIAGNOSTICS 



Wojciech CHOLEWA* 



15.1. Introduction 

Modern measurement technology makes it possible to continuously observe 
and record signals connected with the courses of technological processes, and 
machinery or devices which take part in these processes. Most often, the sig- 
nals are supplied to modules which analyse them in order to estimate a set 
of their features forming the symptoms of the present technical state of the 
observed object. A particular property of the problems of technical diagnos- 
tics is that they are usually related to objects (e.g., machines) of different 
constructions. It requires a distinction between the forms of databases, and 
specialisation of rule sets applied within an inference process dealing with 
the technical state of the object. Additional difficulty is that the history of 
changes occurring in the observed objects (e.g., modernization) and the his- 
tory of their maintenance (e.g., repair and control) must be recorded and 
taken into account in the inference process. The need for applying monitor- 
ing and diagnosing devices to complex technical objects is the main reason 
for research whose goal is to find proper tools for aiding the processes of 
the design and maintenance of such devices. Interpreting the results of sig- 
nal analysis is a difficult task. It always requires some kind of experience 
regardless of the fact that the diagnosing is based on an exhaustive model 
of the object or on diagnostic rules which are considered to be valid for a 
determined class of machinery. 

Due to the lack of general methods of the description of diagnostic (ex- 
pert’s) knowledge and the impossibility of algorithmising diagnostic inference, 

* Silesian University of Technology in Gliwice, Department of Fundamentals of Ma- 
chinery Design, ul. Konarskiego 18a, 44-100 Gliwice, Poland, e-mail: wchQpolsl.pl 
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programs which aid performing a diagnosis are difficult to work out. The do- 
main whose aim is to search for solutions of such complicated problems is 
Artificial Intelligence (AI), which is a branch of computer science. AI is con- 
cerned with methods and techniques concerning symbolic inference with the 
use of computers and a symbolic representation of knowledge. Knowledge 
representation is understood as a general formalism of recording, capturing 
and storing any fragment of knowledge, independent of the information con- 
sidered. Presenting a commonly accepted definition of AI is difficult, and 
almost impossible. A reason for that is the fact that some scientists insist 
that there is a contradiction between the meaning of the terms artificial and 
intelligence. However, definitions emphasizing different aspects of this do- 
main are often cited. One of the first definitions was formulated by Minsky, 
who stated that artificial intelligence is the science of machines which carry 
out tasks that demand intelligence while undertaken by a human being. Ac- 
cording to Feigenbaum, Artificial Intelligence is a domain related to methods 
and techniques concerning symbolic inference with the use of computers and 
a symbolic representation of knowledge. While the definition given by Min- 
sky because of its generality is very near to the one commonly understood, 
the definition by Feigenbaum puts emphasis on the application of tools as 
symbolic inference methods and knowledge representation used in solving 
the task by machinery. Among numerous problems which are considered as 
related to artificial intelligence, a variety of reasoning-oriented examinations 
concern expert systems. 

The expert system is a computer program which is designed for solving 
specialized tasks demanding professional knowledge and experience in the 
application of the knowledge. There are attempts made at the application 
of expert systems in order to aid the processes of the observation of the ex- 
amined object (e.g., complex monitoring systems), the numerical estimation 
of signal features (e.g., programmed signal analysers), the process of cap- 
turing data about the examined objects, and the process of inference about 
the technical state of the object on the basis of estimated features of diag- 
nostic signals. 

Expert systems, which use a specialist’s knowledge concerning any selected 
domain, may apply this knowledge effectively and repeatedly. At the same 
time, they enable us to avoid performing repeatedly the same, analogous 
expert assessments by a human expert and undertake more creative tasks. 
A particular advantage of such systems is that they prevent solving some 
tasks without a direct participation of a specialist. Moreover, they are able 
to apply knowledge acquired from a team of specialists. 

The basic element of a majority of expert systems is the inference module 
(Fig. 15.1). The classes of expert systems are determined by the assumed 
principles of their operation. The possibility of applying a particular expert 
system often depends on the operation of the explaining module and the mod- 
ule responsible for the dialogue with the user (the user interface module) . 
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Fig. 15.1. Main elements of the static expert system 



The main tasks undertaken by most expert systems can be related to one 
(or a few) of the following categories: 

• interpreting (interpretation systems applied to the supervision, recog- 
nition of speech or patterns, identification of signals); 

• diagnosing (diagnostic systems used in industry, medicine or economy); 

• predicting (systems inferring about the future on the basis of given 
situations - e.g., weather forecast, forecast of future crops, prediction of the 
development of an illness or the range of required repairs, etc.); 

• assembling (systems which configure objects while different limitations 
are being recognized - e.g., assembling complex computer hardware ); 

• planning (programming an action, automatic programming of robots, 
planning experiments or military actions, etc.); 

• monitoring (systems of continuous supervision for technological pro- 
cesses, medicine or traffic, etc.); 

• repairing (systems creating schedules of operations carried out while 
damaged objects are being repaired); 

• instructing (systems of professional improvement for students or users 
of complex devices); 

• controlling (systems which interpret, predict, repair and monitor the 
behaviour of the object). 

The literature which discusses different aspects related to the construction 
and application of expert systems is very extensive. 
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Nowadays, the purposefulness of the application of these systems is not 
questioned, but one can state that there are few examples of their application. 
A reason for that is the significant difficulty which appears in the design and 
construction stage when the system is required to operate in the real time. 
However, from the beginning of the development of expert systems they have 
been applied as diagnostic ones (Shortliffe, 1976). Their application areas 
include medical as well as technical diagnostics. In spite of the seeming sim- 
ilarity between these medical and technical applications, which results from 
the fact they are both related to diagnostics, there are considerable differ- 
ences between them. They deal particularly with the ways of representing 
knowledge. The main object of the medical diagnostics belongs only to one 
class of objects. Nowadays, the knowledge of these objects is great and is 
continuously expanding. The subjects of interest of the technical diagnostics 
are technical objects (processes, machines, devices), which are usually char- 
acterised by extended diversity. The knowledge of these objects is often not 
sufficient. 

One can consider two categories of diagnostic expert systems: 

• the first category includes static systems^ which usually operate off-line 
and aid the solving of problems in a constant environment; 

• the second category encompasses dynamic systems, which often operate 
on-line; they are designed for undertaking tasks in a varying environment, 
limited time periods and insufficient resources (information). 

It is also possible to enumerate systems which belong to one of the above 
categories and operate according to different principles. Owing to the limit- 
ed scope of this chapter, only two classes of expert systems are taken into 
consideration. Their particular property is that the knowledge necessary for 
their operation is written in the form of sets of diagnostic rules or as belief 
networks. According to that, system of the first class are called rule expert 
systems. The great interest in these systems and their popularity can be 
explained by the simplicity of their operation. 

It is worth stressing that expert systems are characterised by a particular 
property, which is a favourable factor of their dissemination resulting from 
the possibility of developing the system’s basic elements including (Fig. 15.1) 
the inference module, the explanation module and the module which enables 
us to communicate with the environment. This development is independent 
of the procedures of recording data in databases and knowledge bases. Such 
an approach makes it possible to develop expert systems which include empty 
databases and knowledge bases. These systems are called shells. They may be 
worked out by specialists in the filed of numerical methods and techniques of 
artificial intelligence. The main advantage of the shell system is that difficult 
tasks, such as the construction of fixed modules of the system, do not have 
to be solved by the domain specialist. The only tasks the specialist should 
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undertake is to define the knowledge base and the way the knowledge is 
acquired. 

Knowledge acquisition becomes a separate task. The number of acqui- 
sition techniques is enormous (Moczulski, 2002). They are discussed in 
Chapter 17. 



15.2. Knowledge representation 

The meaning of knowledge representation is often freely interpreted by differ- 
ent authors. This term determines the general formalism of passing, recording 
and capturing any fragment of knowledge, independent of the domain and 
the information considered. An example of the necessity of determining such 
a formalism is the situation when the specialists’ knowledge of the selected 
field is to be used in the expert system. In this case, recording in a computer 
memory a dialogue with the specialist is not purposeful. Moreover, the algo- 
rithm of translating this dialogue into the form of a formalised language is 
not efficient either. The most common form of knowledge representation are 
statements, which are formulated in natural languages (e.g., publications), 
or in the form of signs of a formalised meaning (e.g., mathematics). Their 
understanding is dependent on the possession of knowledge and the ability 
to apply it. It is usually required that the knowledge notation be simple, 
complete (exhaustive), concise, comprehensible and clear, without any con- 
jecture or ambiguous elements. For example, the last criterion results from 
the requirements that the authors of the knowledge base and the users of 
the system where the base is included understand terms and their meanings 
similarly. 

In order to simplify the problem, one may state that the discussed knowl- 
edge notation should allow identifying specified objects and relationships be- 
tween the objects taking into account the domain considered. The objects 
can be real, existing things, or abstract terms. 

The formalisation of the ways of knowledge representation is a difficult 
problem, which has not found a solution general enough. Some examples 
of such solutions include the KIF {Knowledge Interchange Formal KIF In- 
ternet) and the KQML {Knowledge Query and Manipulation Language^ the 
KQML Internet). It was stated that the selection of a given technique of 
knowledge notation should follow the establishments concerning the subject 
of notation. The subject can be individual objects, properties of objects, rela- 
tionships between objects, single events, reasons and causes of events, results 
of events, statements, etc. While constructing large expert systems, there is 
a particular problem connected with the so-called meta-knowledge represen- 
tation. This type of knowledge determines the ways of knowledge application 
and management, thus it is a kind of knowledge about knowledge. 

In most cases, individual knowledge units concerning a given domain are 
mutually related in different ways. They reflect thematic ranges of some 
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terms, etc. Apart from that, a need for writing some specific information 
often appears. It is connected with contradictory information, information 
dealing with exceptions and the necessity of verifying whether information 
is true. It is particularly important when information is obtained as a re- 
sult of several specialists’ opinions, which may contradict one another. Con- 
cerning numerous applications, the formal assumption that the knowledge 
base does not include such contradictory information is not a proper one 
and causes problems writing useful rules which are true in most cases, but 
not always. 

One of the main advantages of the expert system is the possibility of 
inference based on large sets of knowledge concerning a given domain. Infer- 
ence methods and techniques of knowledge representation are only partially 
related to the nature of the task to be solved. It means that it is not required 
to arrange them individually for every domain. The techniques present- 
ly applied can be divided according to two following types of knowledge 
representation: 

• procedural representation, which consists in determining a set of proce- 
dures representing the knowledge of a domain (e.g., the procedure of calcu- 
lating a solid cubic, the procedure of estimating the day name on the basis 
of the date); 

• declarative representation, which consists in determining a set of state- 
ments and rules specific for the domain considered (e.g., the catalogue of roll 
bearings written in the database). 

The advantage of the procedural notation of knowledge is that the anal- 
ysed processes may be represented with high efficiency. Considering declara- 
tive representation, one can state that it is more economical. Each statement 
and rule can be written only once and their formalisation is easier. The knowl- 
edge of the domain written in the procedural form is often taken into account 
at the stage of programming. Thus, the knowledge base is not a separate, in- 
dependent element of the expert system. The declarative representation of 
domain knowledge entails that the knowledge base is separate, which facil- 
itates further modifications. In the case when the knowledge base is not an 
element independent of the program, every modification requires changes of 
this program, which are usually impossible to be performed by the program’s 
users. The optimal representation includes properties of procedural and rep- 
resentative approaches. 

The techniques of knowledge representation which are most often ap- 
plied (techniques of knowledge base organisation) are: approaches based 
on direct application of logic, decision tables, statement notation, rule no- 
tation, semantic networks, networks of statements, neural nets, and belief 
networks. 

It is worth stressing that the enumerated techniques are seldom applied 
as single methods; instead, they are joined together. The selection of the 
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technique of knowledge representation is dependent on several factors. As 
the most important ones we can mention: 

• the sort of knowledge which is required for the correct operation of the 
expert system; 

• the kind of domain whose knowledge is to be written; 

• the required size of the knowledge base which is being organised, where 
a needless increase in these sizes should be avoided. 



15.3. Statements and rules 

There are numerous examples of expert systems in the literature. According 
to the analysed class of systems dedicated to aid technical diagnostics as 
particularly useful one can mention systems based on statements and rules. 

15.3.1. Statements 

A statement is information about accepting a proposition about observed 
facts or expressed opinions. There are some attempts to differentiale between 
opinions, statements, news and announcements. In order to limit an excessive 
formalism, the statement s is defined as an ordered tuple: 

X — (o, a, 1C, t, 6), (15-1) 

where o, a, v is the content of the statement, thus the opinion that the object 
o is assigned the attribute a, whose value is v. The parameter t determines 
the time period (or the time moment) in which the object o is analysed. 
The parametr b is an estimate of the degree of truth about or belief in the 
content o, a, v in the time t. The weight of the statement, interpreted as the 
degree of importance, is represented by w. The weight can be, for exemple, 
the basis of ordering messages being sent to the users of the system. 

In order to establish the function which returns the values of consecutive 
elements of the tuple (15.1), corresponding to the statement x, the following 
notation will be applied: 

o{x),a{x),v{x),w{x),t{x),b{x). (15.2) 

The statement can be understood as a generalised term of a proposition, 
which is used in propositional logic. This kind of logic concerns only such 
statements which may be approved as true or false. Generalisation related to 
statements admits propositions which can be approved neither as true nor as 
false ones. The content of the proposition is information about the observed 
facts representing specified opinions. The proposition is represented by the 
following elements of the statement x: 

(o(x),a(x),v(x)). 



(15.3) 
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In order to simplify tasks related to the notation of statements as well 
as the generation of the explanation of statement contents, one can intro- 
duce dictionaries of propositions (15.3). Since the content of the statement 
X usually includes elements which may be repeated in the contents of other 
statements, it is reasonable to create additional dictionaries of the objects’ 
names and attributes of their values. Then, the dictionary of statement con- 
tents (i.e., the dictionary of propositions) may include only their indexes. 
They may gather additional descriptions as well as explanations related to 
their elements. Obviously, the application of dictionaries requires laborious 
operations concerning the specification of the names applied and the estab- 
lishing of their meanings. 



15.3.2. Rules 

The sets of statements (e.g., dynamic databases) are not enough to repre- 
sent the knowledge of the domain considered. Diagnostic knowledge is often 
written in the form of rules, which are defined as follows: 

RULE: if premise then conclusion, (15-4) 

where premise is an expression consisting of simple logical propositions con- 
nected by the functors ‘and’ and ‘or’. The premise determines conditions, 
resulting from propositions, which should be fulfilled for the conclusion to be 
accepted. Such acceptance corresponds to recording in the dynamic database 
propositions related to this conclusion. 

As examples of diagnostic rules (a fragment of the expert system which 
aids the diagnostics of compressors) one can give: 

RULE: 

if symptoms of whirl vibration in the slide hearing were detected 

and temperature of the hearing decreased, (15.5) 

then radial clearance in the hearing is too high. 



RULE: 



if symptoms of whirl vibrations in the slide hearing were detected. 



and temperature of the bearing is normal, 

and transversal section of the bearing shell is circural. 



(15.6) 



then modification of shape of the hearing shell is recommended. 
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Some systems admit a developed (the so-called full) form of the rules: 
RULE: 

if premise 

then conclusion 1 (15*7) 

else conclusion 2. 

It should be stressed that the full form of the rules (15.7), which is admis- 
sible from the formal viewpoint, often leads to the acceptance of unexpected 
(by the author of the rules) conclusions. That occurs particularly in large 
expert systems. A reason for that situation is the fact that because of the 
assumed completeness of the database, the lack of statements is approved as 
the information that the statement is false. Because of that the application 
of rules only in the basic form (15.4) is recommended. It lets us simplify 
the operations carried out by the rule interpreter. A particular advantage is 
that a practical assumption about the analysis of the rules can be made. The 
process of premise interpretation is interrupted (with the negative indicator) 
when the first unfulfilled (false) condition is met. 

15.3.3. Inference schemes 

Inference is a process in which a new, unknown truth degree of conclusion 
is derived on the basis of premises, which are considered as true. Inference 
in which the conclusion logically results from the premises is called deductive 
inference. There are only two basic reliable schemes of deductive inference in 
the classical logic: 

• the scheme modus ponens: for the propositions p and q, if the impli- 
cation p q is true and the proposition p is true, then the proposition q 
is true; this scheme is written in the following from: 

p q it rains — >■ grass is wet 

p it rains (15.8) 

q thus grass is wet 

• the scheme modus tollens: for the propositions p and q, if the impli- 
cation p ^ q is true and the proposition q is false, then the proposition p 
is false; the scheme is written in the following form: 

p ^ q it rains — > grass is wet 

-<q grass is not wet (15.9) 

.’. -ip thus it does not rain 

In the case when the only knowledge we posses is that the premise p is 
false (or that the conclusion q is true), there is no reliable inference scheme 
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which lets us estimate the logical value of the proposition q (or, respectively, 
the p one). The schemes (15.10) and (15.11) can be used only for formulating 
hypotheses, which are supposed to be separately proven: 



p^q 

q 


it rains grass is wet 
grass is wet 


(15.10) 


:.p 


thus it rains 




p->q 


it rains grass is wet 




-ip 


it does not rain 


(15.11) 


.'. -ig 


thus grass is not wet 





The schemes (15.8) and (15.9) allow inferring about conclusions on the basis 
of premises and about premises on the basis of conclusions. It entails that the 
schemes can be applied in the case of forward as well as backward inference. In 
both cases the direction of inference is determined independently of the order 
of consecutive actions carried out by expert systems. Particularly, this order 
can correspond to the forward inference strategy (on the basis of data) and 
to the backward inference strategy (hypotheses verification). The selection 
of inference algorithms for the expert system is often connected with the 
selection of an appropriate search strategy of possible solutions from two 
search strategies: the depth-first and the breadth-first search. 

15.3.4. Non-monotonic inference 

Inference with the use of classical proposition logic is always a monotonic 
process. It entails that new premises cannot cause a change of previously 
accepted conclusions. Research concerning inference in the so-called common 
sense reasoning suggests that non-monotonic logic should be introduced, be- 
cause it is impossible to state that each conclusion used in everyday life (e.g., 
there is a need to take an umbrella) results from the formalised monotonic 
inference process. However, at the same time, there is no reason for suppo- 
sitions that in doubtful situations conclusions are accepted accidentally. To 
minimize the risk related to the acceptance of wrong conclusions (decisions), 
knowledge and experiences are applied in order to find reasonable conclusions. 
As opposed to the classical logic, which is based on the term of truthfulness, 
the non-monotonic logic is based on the term of reasonableness. 

If one omits here complicated formal discussions, the following model of 
a non-monotonic inference rule can be assumed: 

if it is known that A 

and the lack information about B (15.12) 



then we assume that C . 
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It should be strongly stressed that the application of the scheme (15.12) can 
lead to situations where the acceptance of the selected proposition as true 
(e.g., the proposition B in (15.12)) results in the fact that the logical value of 
the previously accepted conclusion (e.g., the proposition C in (15.12)) needs 
to be changed. The possibility of modifying previously found conclusions is 
a particular property of non-monotonic inference. 

15.3.5. OR functor 

The premises of rules can be complex statements which consist of simple 
logical propositions joined by means of the functor ‘or’: 

RULE: 

if it rains, or sprinkler has been used, then grass is wet. (15.13) 

Because of the specific requirements of inference systems, the application 
of the ‘or’ functor is acceptable but not advised. In order to simplify the 
verification of rules, which should be conducted by their author, the ‘or’ 
functor should not be used. A complex rule which includes the functor is 
advised to be replaced by a set of rules that do not contain this functor. 

For example, the last rule can be replaced with two following rules: 

RULE: 

if it rains, then grass is wet. (15.14) 



RULE: 



if sprinkler has been used, then grass is wet. (15.15) 

15.3.6. Context 

The classification of rule sets lets us distinguish between two kinds of rules, 
called respectively simple rules and complex rules. The simple rules are char- 
acterised by the simple premise part. Their premise contains only one condi- 
tion. The complex rule is a rule which allows determining the conclusions di- 
rectly. Intermediate conclusions determined with the use of other rules are not 
required to be taken into account. These rules are characterised by complete, 
often very complex premises, which include a large number of intermediate 
conditions. An example of the complex rule is 

RULE: 



if conditionl, and condition2 and . . . then conclusion. (15.16) 

Complex rules require the application of simple interpreters. Their disad- 
vantage is that a proper rule set is difficult to formulate. Its verification and 
completion is complicated. Additionally, the disadvantage of these rules is 
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that they are characterised by large-scale redundancy. The same conditions 
are often repeated in several rules. 

One can enumerate different approaches which consist in the applica- 
tion of auxiliary rules, whose conclusions are intermediate. In this case, the 
premise part is not complex. A result of their application is that the in- 
ference process to be carried out consists in consecutive analyses of more 
detailed intermediate conditions leading to the final one. The advantage of 
that approach is that the verification of rules is simple and their redundancy 
can be limited. The disadvantage is that the operations carried out by the 
rule interpreter are usually complex. 

Let us limit our consideration to such rules whose premise parts do not 
contain the functor ‘or’. Then the rules can be defined as follows: 

RULE: 

if Ai and A2 and . . . and Ak then B. (15.17) 

According to propositional logic, this rule is equivalent to the rule in which 
the order of the conditions Ai and A2 was changed: 

RULE: 

if A2 and Ai and . . . and Ak then B. (15.18) 

It is important that this change does not change the value of the premise. 
However, one should remember that the analysis of premises in expert systems 
is often related to numerous side-effects. For example, other rules should be 
analysed or some additional questions are required to be posed for the user. 
In order to limit the operations carried out by the interpreter, it is assumed 
that the analysis of the logical statement (premise) is interrupted provided 
its logical value is already known, i.e., after the first false element is met. The 
change of the order of the premises influences, among other things, the order 
of the questions asked and the change of their thematic range. 

The term context C was introduced in order to emphasize the meaning 
of the order of premises and limit the possibility of changing this order. The 
definition of a rule with the use of the context can be written as follows: 

RULE: 

if Ck and Ak then B, (15.19) 

where 

Ck = Ai and A2 and . . . and Ak~i (15.20) 

is the context of the element Ak . Similarly, 

Ck-i = Ai and A2 and . . . and Ak-2 

C3 = Ai and A2 
C2 = Ai, 



(15.21) 
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Introducing the context enables us to consider necessary conditions which 
check whether the estimation of a given element of the premise is purposeful. 
It leads to a distinct structuralisation of the rule set. 

15.3.7. Production rules 

The enumerated kinds of rules make it possible to represent knowledge in the 
declarative form, but, at the same time, they do not enable us to represent 
simply the knowledge of processes, the information about the sequences of 
performed operations or strategies of action, etc. This inconvenience is partic- 
ularly visible in expert systems designed for the needs of technical diagnostics. 
For example, in this case, one of the elements of the knowledge base should be 
information about the order of different auxiliary examinations (from cheap 
and simple to complex and detailed ones). In order to represent this type of 
knowledge in the form of rules, the term of the previously defined rule was 
generalised. It was assumed that the conclusion of the rule is the description 
of an operation or a command to perform it. In this case it is not a statement: 

if premise then operation. ( 15 . 22 ) 

A rule defined in such a way is called a production rule. Its conclusion is 
interpreted as an instruction for a proper operation. These instructions are 
not connected with any logical value. In order to distinguish them from the 
rules previously described, which are not production ones, they will be called 
inference rules. 

An example of the production rule is as follows: 

RULE: 

if symptoms of whirl vibration in the slide hearing were detected 
and temperature in the hearing is too high, 

( 15 . 23 ) 

then verification of thermal calculations of the hearing 
is required to be done. 

This generalisation makes it possible to widen the range of possible appli- 
cations of the rule. In particular, it allows formulating rules concerning the 
course of the inference process. The description of the inference strategy 
is called meta-rule. The notation of knowledge in the form of production 
rules can be interpreted as declarative-procedural representation. It should 
be stressed that inference rules can be understood as a particular form of 
production rules. 

15.3.8. Explanations 

Some of the most important elements of expert systems are explaining mod- 
ules. The great usefulness of these modules comes from the fact that the users 
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of an expert system needs explanations in order to approve the conclusion 
formulated by the system. These explanations are also required because the 
expert system should include the knowledge base, which can go beyond the 
knowledge of the user. However, according to the present legislative acts, 
in most countries the total responsibility for the results of the operation of 
expert systems is borne by their users. 

In order to provide proper information related to explanations, one can 
develop basic forms of rules and write them jointly with proper explanations: 

RULE: 



if premise then conclusion because explanation. (15.24) 

15.3.9. Sets of rules 

When the user allows it, the expert system should be able to start and con- 
tinue the inference process automatically as well as to undertake actions re- 
lated to this process. The application of production rules requires designing 
a proper interpreter. Deciding about the order of the analysis of conditions 
and eventually the execution of rules is a difficult task. It is particularly im- 
portant in large complex systems that consist of a few thousands of rules. 
The degree of the influence of the system’s constructor on the operation of 
the rule interpreter (the so-called inference mechanism) is dependent on pro- 
gramming tools used when the system is being constructed. The interpreter is 
not required to be designed, when complex software such as shell systems or 
specialized programming languages (e.g., PROLOG) are applied to develop 
the system. It does not mean that the constructor may pass over the princi- 
ples of the operation of the rule interpreter. The form of the knowledge base 
and particularly the way the rules are formulated should match the given 
inference mechanism. 

A correct inference process should be carried out according to the as- 
sumed operation strategy, which is deflned with the use of meta-rules, or by 
grouping and ordering the rules. Monotonic and non-monotonic rules should 
be considered separately. In the expert system, a characteristic property of 
the set of monotonic rules (as opposed to non-monotonic ones) is the fact that 
the result of inference does not depend on the order of their analysis. The or- 
der may exercise its influence only when the expert system operates. In most 
cases, the rules applied, particularly approximate ones, are non-monotonic. 
That entails that the constructor of the system should determine the order 
of their consideration. Because of the requirements for expert systems, which 
concern mainly the simplicity of their modiflcation and completeness, this or- 
der should be declarative, not procedural. The rules applied to this purpose 
are meta-rules. In order to unify the ways of rule and meta-rule notation, it 
is purposeful to introduce additional statements which determine the end of 
selected tasks carried out by the system. 
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15.4. Representation of approximate knowledge 

Numerous expert systems (e.g., diagnostic ones) can be considered as partic- 
ular models of approximate inference conducted by specialists in given do- 
mains. Approximate inference in expert systems (understood as an equivalent 
of such inference processes) consists, among other things, in the application 
of either: 

• exact rules and approximate premises, 

• approximate rules and exact premises, or 

• approximate rules and approximate premises, 

as opposed to exact inference, which is characterised by the application of: 

• exact rules and exact premises. 

The term representation of approximate knowledge is used interchange- 
ably with the expression representation of uncertain knowledge^ where uncer- 
tainty is defined as the lack of proper information necessary to make a deci- 
sion. The sources of uncertainty in the discussed systems can be: 

• the randomness of the analysed events, 

• measurement deviations, 

• the presence of hidden (passed over) properties or parameters which 
result in a random character of the phenomena described by an incomplete 
set of variables, 

• a lack of proper knowledge or the existence of contradictions in it. 

Most people can formulate conclusions and make decisions on the basis 
of inaccurate, uncertain and incomplete or partly contradictory data. These 
situations are reasons for searching for formalised ways of behaviour. The 
results obtained with the use of the presently known algorithms of approx- 
imate inference should be approved as satisfactory. In some situations they 
are even good. When making a decision about the application of approxi- 
mate inference to the expert system, there should be strictly defined ways of 
determination: 

• uncertainty of approximate statements, 

• uncertainty of approximate inference rules, 

• uncertainty of premises which consist of several statements, 

• uncertainty of conclusions which result from approximate premises and 
approximate rules, 

• uncertainty of conclusions estimated with the use of several independent 
approximate rules. 

There is no general method of approximate inference that would be adopt- 
ed by the constructors of expert systems. Several categories of the degree of 
certainty of premises, conclusions and rules are applied. The task which is 
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very important and difficult is to estimate a proper interpretation of the 
measures (degrees of certainty) applied. One of the easiest ways to do it is 
to apply probabilities. They can be interpreted as both the probability of 
the fact that the statement appearing in a premise of the rule is true, or 
conditional probabilities assigned to the rules. That enables us to estimate 
the probabilities of statements which are conclusions of rules. The conclusions 
can be understood as hypotheses. Statistical methods of verifying hypotheses 
allow making a decision about their truthfulness, thus they are approved as 
theorems. There are a lot of opinions recommending such a solution. However, 
its main disadvantage is the need of assuming about a proper statistical in- 
dependence of premises in the rule set. Possibilities of a practical verification 
of the truthfulness of this assumption are often very limited. 

The assumed principles of operations with the use of the degrees of cer- 
tainty can cause some doubts. The main one is related to the fact that the 
logical value of the statement (true, false) is often identified with its certain- 
ty. Numerous doubts are also connected with side effects of the models of 
propagating uncertainty and inaccuracy in the inference set. 

There are interesting methods, in which uncertainty and inaccuracy are 
taken into account, where proper measures are pairs of numbers, which are 
interpreted as, e.g., bottom and upper limitations of the degrees of the credi- 
bility, necessity or sufficiency of the rules. The sufficiency measure determines 
the degree which is enough for the conclusion to be approved as true for the 
totally true premise. Similarily, the measure of the necessity of the rule deter- 
mines whether or not the true premise will be necessary when the conclusion 
is true. There are examples of a modification of the classical inference schemes 
(modus ponens, modus tollens) which allows us to take into account the mea- 
sures of the possibility and necessity of statements as well as the measures 
of the sufficiency and necessity of rules (two-side implication) (Cholewa and 
Kazmierczak, 1995). A separate group of methods which describe the uncer- 
tainty and inaccuracy of rules consists of methods which consider statements 
as fuzzy values and rules as fuzzy relations. 



15.5. Static and dynamic expert systems 

15.5.1. Static expert systems 

The elements shown in Fig. 15.1 are the basic parts of the so-called static 
expert systems, which are characterised by the following properties: 

• the context determining the range of the inference process is static and 
the premises are independent of time; 

• the inference process is monotonic, thus an increase in the set of accept- 
ed premises does not lead to changes of previously approved conclusions. 
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The above assumptions make it possible to define simple principles of the 
operation of such systems. However, it is worth stressing that these assump- 
tions do not correspond to each and every situation occurring in the case of 
diagnostic research. The goal of these examinations is the estimation of the 
technical state of the analysed object. Changes of this state are related to 
changes of input data for the inference process. The data can be obtain as a 
result of several independently operating sources. 

15.5.2. Dynamic expert systems 

Dynamic expert systems are particularly designed to operate in an on-line 
mode, while the environment varies. The purpose of their application is to 
carry out determined tasks within limited time periods, when information is 
also limited. 

In the literature there can be found different strategies of inference, con- 
trolled respectively by the data and the inference goal, which are at the same 
time understood as strategies of the operation of expert systems. 

In the case of dynamic systems it is reasonable to additionally differentiate 
between systems operating in a cyclic way, systems designed for a continuous 
(cyclic) analysis (interpretation) of data, and systems formulating answers 
to the user’s questions. When making a decision about the application of 
the dynamic expert system it is necessary to establish the way in which the 
consecutive tasks are going to be performed within a limited time period 
and with the use of limited information. The following questions must be 
answered: 

• should the system operate very fasti (It is obvious that there may 
always occur a task which cannot be solved even by the fastest system.); 

• should the system guarantee a determined time period of task realisa- 
tion? (It means that one searches for the best solution among all possible 
ones, which can be examined within a given limited time period.); 

• is it necessary that the system undertake all particular tasks. 

Dynamic systems are often included in supervision systems of large ma- 
chinery and industrial processes. In these cases, as additional factors, which 
complicate undertaken tasks, one can point to the changeable conditions of 
operation and the large numer of analysed signals. Moreover, in the case of 
dynamic systems, two time scales should be taken into account, namely the 
micro and the macro scale. The tasks of constructing and putting into op- 
eration these systems are difficult. They require considering different factors 
usually omitted in the case of static systems. In the process of their con- 
struction, one can take into account simplification resulting from the limited 
dynamics of changes of the technical state of the examined real objects. Then, 
the set of the analysed hypotheses can be known and limited. It enables us to 
perform the inference process in the “closed world”, determined by the known 
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set of hypotheses. However, the set is usually very large, and the application 
of the strategies of searching for the complete field of solutions is impossible. 

The agendas of dynamic expert systems are characterised by the following 
assumptions: 

• they contain the set (e.g., a list, table of contents) of tasks to be solved; 

• it is possible that static agendas restored cyclically as well as varying 
agendas may appear simultaneously; 

• the agenda is not a queue of tasks and the tasks included in the agenda 
are not ordered; 

• some priorities can be set for the tasks included in the agenda; they 
may vary (increase or decrease) while they wait for the moment the task will 
be carried out; 

• defined packages of tasks can be included in the agenda in determined 
time moments or after identifying the determined technical state; 

• there is a lack of certainty that all the tasks included in the agenda will 
be performed. 

The systems being discussed can also include schedules of the tasks, which 
contain the following information corresponding to each task: 

• a description of the task being undertaken and its priority, 

• the time of the beginning of the performance of the task (expected, 
advised minimal or acceptable maximal), 

• the time of performing (duration of) the task (expected, expected min- 
imal or maximal). 

There are different methods of organising the inference process in dy- 
namic systems. They ensure the identification of a rational conclusion in 
changeable outside conditions, which entail changes of premises. An effective 
way is freezing outside interactions which influence the system (or environ- 
ment description) when the basic inference cycle is performed (Fig. 15.2). 
Figure 15.2 shows the operation copy. It enables us to consider the dynamic 
system as quasi-static (static within one single inference cycle) and apply 
simple inference methods elaborated for static systems. 




t+l 



Fig. 15.2. Basic cycle of the operation of the dynamic expert system 
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The operation of freezing the environment makes it possible to limit nu- 
merous incompatibilities appearing within dynamic systems. It entails, among 
other things, the elimination of the results of unknown delays, which can ap- 
pear between changes of the values of the premise value and the conclusion 
that is their effect. Apart from the distinction between dynamic and static 
systems, there should be considered systems characterised by both a limited 
acceptable time of operation (searching for the solution) and a secured time 
of inference. 

Both categories may appear in the case of the static as well as dynamic 
systems. A particularly interesting group consists of dynamic systems which 
are characterised by a warranted time of inference. They can be applied in 
the case of cyclic operations (Fig. 15.2). The basis of their operation are 
the results of continuous observations of varying (in the time function) ob- 
jects. The limited time of their operation is essential in order to ensure the 
possibility of realising the basic cycle (in the determined time period) . 

To secure this limited time some increment (called any-time) algorithms 
are introduced. They are very effective but their elaboration is difficult. Their 
essence is included in the following facts: 

• operations which are performed according to the algorithm can be in- 
terrupted at any moment; 

• the result is always accessible after interruption at any moment; 

• the quality of the obtained result improves monotonically; this improve- 
ment is a function of time in which the operation is performed. 

The basic difference between the static and dynamic systems and systems 
with a limited and unlimited time of operation consists in the fact that the 
first division can lead to dynamic non-monotonic systems, which can negate 
previous conclusions when new data appears. 

It should be stressed that the lack of monotonicity is an expected property 
of diagnostic expert systems. The result of inference conducted with the use of 
such systems should be the identification of the technical state of the analysed 
object. The state can vary in time and the systems are applied in order to 
identify these changes. It implies, e.g., that the conclusion object A is usable 
for maintenance is formulated and accepted as true only at one selected time 
moment. At other moments it may be false, which leads to non-monotonic 
inference. 

15.5.3. Blackboard 

Let us focus our attention on expert systems, which aid the diagnosing of 
complex objects. They are usually characterised by a large space of possi- 
ble solutions, incomplete data and approximate, uncertain and incomplete 
knowledge of undertaken tasks. The application of the general concept of a 
blackboard (Fig. 15.3) is here specially reasonable. 
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Fig. 15.3. General concept of the blackboard 



The blackboard is a place where messages containing information about 
the values of statements are available to receivers (C). The messages are pro- 
vided by units (A), which are the sources of messages to the administrator 
(B). This concept was introduced into systems which are designed for speech 
recognition (Engelmore and Morgan, 1988) and it has been continuously de- 
veloping (e.g., Hayes-Roth, 1990; 1995). There are some attempts made at 
defining specialized programming languages. An interesting remark is that 
the blackboard can be understood as a hierarchically ordered database, de- 
signed for storing solutions generated by autonomous modules. The modules 
can apply different techniques of inference in order to achieve the best results. 

The blackboard can be also modelled as a network of mutually interacting 
statements (Fig. 15.4), according to the known set of rules. It operates as a 
module which transforms statements. 

The inference process, when the blackboard is included, consists in up- 
dating messages placed up on the table (Fig. 15.3). This process is supervised 
by the administrator and it is organized similarly to expert systems, in which 
knowledge is written in the form of rules. 

' One can consider different kinds of statement networks. They can be 
characterised by numerous ways of defining statement values. In the following 
parts of the chapter, we discuss two ways of inference in complex statement 
networks. They include: 

• inference in networks of approximate statements, 

• inference in belief networks. 




Fig. 15.4. Network of dynamic statements: a* - outside 
interactions, b* - to the observer, x* - dynamic statements 
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15.6. Inference in networks of approximate statements 

There are many different ways of representing approximate statements. The 
present chapter discusses only two problems. The first one concerns the de- 
gree of truth. The second one is related to inference in networks containing 
approximate statements described by the degrees of truth. Particular atten- 
tion is paid to benefits resulting from a proper distinction between necessary 
and sufficient conditions. Inference applied in the discussed networks is re- 
lated to the assumption that the blackboard used includes the nodes of the 
network. 

15.6.1. Primary and secondary statements 

A number of publications which describe the concept of the blackboard can 
be enumerated. The tables contain elements (messages) which are often un- 
derstood as active elements called agents. This concept is a very interesting 
theoretical approach (Haddadi, 1995; Singh, 1994). However, there appear 
some difficulties while attempts at practical applications are made. 

The main difference between the classical concept of the blackboard (En- 
gelmore and Morgan, 1988) and its described model is that the form of the 
elements of such a table is different. These elements are called active state- 
ments. They are not knowledge sources or domains. This name stresses that 
changes of the statement values can initiate sequences of operations which 
cause automatic changes of the values of other statements. The statements 
available on the blackboard can belong to one of the following classes: 

• primary statements^ whose values depend on the values of other state- 
ments; they are not set directly by outer processes (e.g., modules which ac- 
tivate the data); 

• secondary statements^ whose values are dependent on the values of 
their statements, which play the role of premises (the values of secondary 
statements are not set directly). 

Secondary statements can be also interpreted as equivalents of conclusions 
in rules which correspond to selected fragments of the network of statements. 
In systems where rules are applied one often differentiates between the rules, 
their premises and conclusions. In the case of the discussed blackboard this 
distinction is not necessary. The elements of the blackboard are analysed 
together as the whole statement (the node of the network of statements - 
Fig. 15.4), whose value is equal to the degree of truth or belief. These state- 
ments can be simple (dependent on only one premise) or complex (dependent 
on more that one premise). The dependency of the complex secondary state- 
ment on the premises can be: 

• negation (operator NOT) of another statement; 

• conjunction (operator AND) of necessary premises; 
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• alternative (operator OR) of sufficient premises; 

• aggregation (operator AGG) of independent sufficient premises. 

A special role is played by the aggregation operator (Cholewa, 1985). This 
operation is a process of acquiring different, independent opinions, which 
should be considered as a whole. The opinions about the present state of the 
analysed object can be formulated on the basis of different sources. Often, 
a few statements are enough to approve a conclusion. According to classi- 
cal logic, in order to formulate the conclusion, only one sufficient premise 
should be known. However, taking into account the accuracy of real mea- 
surements, one can conclude that a larger number of the approved suffi- 
cient premises will usually lead to an improvement of the quality of this 
conclusion. 

All statements can change their values with time. The statements available 
on the blackboard gather histories of their changes in the form of a plot of 
these statements in the time domain. The collection of the total history of 
changes of the primary statements and the limited history of changes of the 
secondary statements make it possible to repeat (by the system) selected 
situations from the past. It can be applied both to verify the system and to 
generate explanations which describe the legitimacy of conclusions proposed 
by the system. 

15.6.2. Approximate value of the statement 

The values of approximate statements can be defined differently. In this chap- 
ter it is assumed that the value b{x) of the statement x is the measure of 
the acceptance of the statement (15.3). The simplest definition of the value 
b{x) of the statement x is the application of logical constants YES, NO: 

b{x) e {YES, NO}, (15.25) 

which directly leads to propositional logic. However, diagnostic expert sys- 
tems require using approximate and uncertain statements. There are a lot of 
ways of representing these statements. One of them is making the assumption 
that the value of the statement x is equal to its degree of truth T(x), which 
is a real number from the range [0, 1]: 

b{x) = T{x) e [0,1]. (15.26) 

15.6.3. Necessary and sufficient conditions 

In order to properly formulate the rules, one distinguishes two classes of con- 
ditions (premises), called respectively sufficient and necessary conditions. 

In the case when the statement x is approved as true, and the statement y 
is also true (but not necessarily conversely), the statement x is defined as 
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the sufficient condition for y. At the same time the statement y is deter- 
mined as the necessary condition for x. If the statement x is also the nec- 
essary and sufficient condition for y, then also y will be the same condition 
for X. In entails that for the two statements x and y characterised by the 
values 

6(a:)e[0,l], 6(2/)6[0,l], (15.27) 

the information that the statement x is a necessary condition for y can be 
written as follows: 

b{y) > b{x), (15.28) 

where, taking into account (15.27), 

• approving the truth of the statement x (thus b{x) — 1) results in 
the fact that b{y) = 1, which is equivalent to approving the statement y 
as true; 

• approving the statement x as false (thus b{x) — 0) results only in the 
fact that b{y) > 0, which is a meaningless conclusion. 

Analogously, the information that y is a necessary condition for x can 
be written in the form 

b{x) < b{y), (15.29) 

where, taking into account (15.27): 

• approving the statement y as false (thus b{y) = 0) results in the fact 
that b{x) = 0, which means the statement x is false; 

• approving the statement y as true (thus b(y) = 1) results only in the 
fact that b{x) < 1, which is a trivial conclusion. 



15.6.4. Approximate conditions 

Approximate conditions can be understood as necessary and sufficient con- 
ditions met with small inaccuracy. According to that, the condition (15.28) 
was transformed into the form 



b{y) > b{x) - S, 



(15.30) 



and the condition (15.29) into 



b{x) < b{y) + (5, (15.31) 

where (5 is a fixed value common for all analysed conditions J G [0, 1], which 
determines the acceptable degree of the approximation of all conditions. This 
value for the exact condition is 6 = 0. 
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15.6.5. Approximate conjunction and alternative of statements 

The conjunction of statements can appear in the case of necessary conditions. 
The assumed limitations concerning the values of statements which result 
from the conjunction between them are 

z = X AND y, (15.32) 

which can be written in the form of the set of inequalities similar to (15.31): 



h{z) < b{x) + ^5 

< Ky) + 



(15.33) 



The alternative of statements can appear in sufficient conditions. Limitations 
for the values of statements which result from their alternative, defined as 
follows: 

z = x ORy, (15.34) 

can be written in the form of the set of inequalities similar to (15.30): 



b{z) > b{x) — 6, 

> Ky) - 



(15.35) 



15.6.6. Equilibrium state in networks of approximate statements 

There are a lot of ways of organising inference processes. Their common 
feature is that they are all based (evidently or hiddenly) on the search for 
proper paths between the known data and the hypotheses being verified. The 
differences between them are the directions of paths, searching strategies, the 
selection of the solution and criteria determining the end of the process. Such 
an approach is characterised by the fact that the general solution is obtained 
on the basis of a sequence of solutions of local tasks. Identifying possible 
contradictions appearing in the database is difficult. 

Let us assume that the value b{x) of the statements x, which are elements 
of the analysed set of the statements x, 

X = {xl, x2, . . . , x/f }, (15.36) 

written in the matrix form 

b{x) = [b{xi),...,b{xH)]^, (15.37) 

should meet the set of inequalities, which is the formal representation of the 
knowledge base: 



Ab{x) > 0, 
1 > K^) ^ Qj 



(15.38) 
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where the sparse matrix A is the formal representation of the knowledge 
base. The expression (15.38) can be a contradictory set both for the statement 
X e X, whose values are estimated on the basis of the results of measurements, 
and the knowledge base acquired from sources that cannot be totally verified. 
In order to avoid these possible contradictions, the set (15.38) is generalised 
to the form 

Ab{x) + ^ > 0 , 

1 > k{x) > 0, (15.39) 

||J|| ->• min, 

where the solution to this set (15.39) is searched for after assuming a criterion 
dealing with the minimization of the selected norm \\S\\ of the elements of 
the matrix S. The value of this norm can be understood as the measure of 
the degree of contradiction of the analysed set of rules, which contains the 
approximate conditions (15.30)-(15.31). 

One assumed that the identification of the equilibrium state of networks 
can be considered as the problem of linear programming, and searching for the 
solution can be performed with the use of the commonly known procedures 
of the programming. 

To solve the task with the use of linear programming, one begins by 
making an assumption related to the output values of statements which are 
unknown. Their default values are 0,5. It should be stressed that the state 
characterised by such a value for all statements is one of the possible solutions. 
However, this solution should not be the optimal one when the knowledge 
base is correctly constructed, because it means only “nothing is known”. 

Transforming the inference task into a problem defined as a linear pro- 
gramming task allows us to apply highly effective, numerical algorithms. 



15.7. Inference in belief networks 

There is a special class of belief statements, which are characterised by prob- 
abilities interpreted as the degree of belief. These statements can form net- 
works called belief or hayesian networks (Charniak, 1991; Henrion et al, 
1991). The belief network is a representation of joint probability distribution 
defined on a finite set of discrete random variables. Recently, these networks 
have received wide recognition. They are considered as effective solutions, 
useful for approximate inference, i.e., inference based on approximate state- 
ments and/or rules. They found application in numerous domains. For exam- 
ple, one can enumerate a variety of projects related to diagnostics and safety 
management for the aircraft industry and the nuclear industry, applications 
in the modules Office Assistant included in Microsoft Office, whose task is 
to advise the user, taking into account the context. 
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15.7.1. Probability calculus 

Belief networks came into being as a result of combining ideas derived from 
different domains. They are based on statistical methods of inference, taking 
advantage of the so-called Bayes’ model, theories of decision making and 
methods of verifying statistical hypotheses. For a long time numerous authors 
have suggested that these methods be the main tool of approximate inference. 
It is motivated by mathematical methods, which are developed to a large 
extent. Sceptics stress that the application of these methods requires a proper 
measure of probability assigned to a statement. Estimating that measure for 
the statement (and the rule) is a difficult task. The estimated values can be 
connected with large deviations. It entails that the obtained result, in the 
form of the degree of belief, can be connected with significance and unknown 
deviations. Their appearance is independent of mathematical methods, even 
if they are very strict. The formal definition of the measures of probability 
P{x) states that this function assigns to every event x a number included 
in the range [0, 1.0]. This number is a subset x C ft oi the so-called space of 
events 0.. 

Among different events included in the space two categories have a par- 
ticular meaning. These are: 

• independent events; the probability of their joint occurrence is a prod- 
uct of the probabilities of these events; 

• disjoint events; the probability of the occurrence of one of these events 
is a sum of their probabilities. 

Different misunderstandings are caused by the ambiguity of the term prob- 
ability, which is most often understood as the frequency of the occurrence of 
an event: 

P{e) = lim (15.40) 

N-^OO 1\ 

where n(e) is the number of favourable events which occur when observing 
all events N. 

Probability considered to be the measure of the so-called subjective prob- 
ability, which is the measure of belief, is another interpretation of probability. 
This definition is useful in, among other things, the description of Bayesian 
networks. Both of these interpretations of probability (frequency and belief) 
can be practically applied. However, their simultaneous application may lead 
to disagreements with intuition. It is an effect of the fact that the commonly 
approved (often default) relationships 

P{A) = 1 - P{^A) (15.41) 

will be met by the measures of probability, where P{A) and P{-^A) are re- 
spectively the probabilities that the event A occurred and did not occur. This 
relationship is a result of the assumption that there exists only an alternative 
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that the event A occurred or that there is a lack of A. It corresponds to the 
theorem of the excluded middle in classical logic. 



15.7.2. Bayes’ model 

One of the main terms used in the probability calculus is an event The 
knowledge base of an expert system is a set of statements describing the 
events and relationships between them. The probability P{A) of a statement 
A, and, more precisely, the probability that the statement A can be approved 
as true, is equal to the probability of the occurrence of an event described 
by the statement A. The relationships between the statements A and B 
can be described with the use of the conditional probabilities P{A\ B) and 
P{B I A), which are assigned to these statements. These probabilities should 
be clearly distinguished from the joint probability P{A, B) of the statements 
A and B\ 

• joint probability P{A, B) is a measure of belief that the statements A 
and B can be simultaneously approved as true; 

• conditional probability P{A | B) is a measure of belief that the 
statement A can be approved as the correct one when the statement B 
is true. 



A very important property of this approach is that the conditional prob- 
ability P{A I B) applied is independent of the existence or a lack of the 
existence of the cause-effect relationship between the events described by 
the statements A and B. These probabilities are related to the following 
relationships: 

P(A, B) - P{B)P{A I B), (15.42) 

P(A, B) = P{A)P{B I A). (15.43) 



They are the basis of the theorem formulated about the year 1763 by Thomas 
Bayes: 



P{A I B) = 



P{A,B) 

P{B) 



P{B I A)P{A) 
P{B) 



(15.44) 



It is also written in the form 



p(u I p \ _ -P(^j I hi)P{hj) _ P{pj I hi)P{hj) 

P(ej) ~i:P{ej\hi)Pihiy 

i 



(15.45) 



where P{hi \ ej) is a conditional probability that the hypothesis hi is true 
when it is known that the argument ej is true, P{ej \ hi) is a conditional 
probability that the occurrence of the argument ej is true when the hypoth- 
esis hi is true, and P{hi) is an a priori probability that the hypothesis hi 
is true. 
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Bayes’ theorem requires the assumption that the analysed task deals with 
the so-called closed world, which includes independent, separate (15.46) and 
exhausive (15.47) hypotheses. In this world 

n 

= (15.46) 

i=l 

m 

'£P{ej\hi) = l. (15.47) 

i = l 

The described probabilities can be considered as subjective ones. That 
entails that one admits that the value P{B \ A) is acquired on the basis 
of, among other things, questions posed directly for a specialist. However, it 
should be stressed that it can cause numerous misunderstandings related to 
the fact that conditional probability is often interpreted in terms of cause- 
effect relationships. It has been stated many times that the attempts at a 
simultaneous estimation of the probabilities P{h), P{e), P{h \ e), P{h \ -le) 
on the basis of experts’ opinions can lead to a lack of conformity to the 
theorem (15.45). That should be remembered when Bayes’ formula is to be 
applied. 

On the basis of the above remarks, one can conclude that statistical meth- 
ods are one out of a variety of tools which may be applied while a task of 
approximate inference is being solved. However, they are neither the best nor 
the most universal tools. Their advantage is that the compact and mathe- 
matical description is strict. As a disadvantage one can mention the fact that 
the values of conditional a priori probabilities need to be applied, deciding at 
the same time about the obtained solution. The values of these probabilities 
are extremely difficult to estimate. 

15.7.3. Belief networks 

A belief network is an acyclic (without cycles), directed graph, which con- 
sists of nodes and directed branches connecting the nodes. Discrete stochastic 
variables, as collections of statements, are assigned to the nodes. The collec- 
tion of statements is represented by a vector which contains the contents of 
the component statements written in the form of the expression (15.3). The 
size of the collection is equal to the number of the component statements. 
Most often, all statements included in the collection are characterised by the 
same name of the object o{x) and the same name of the attribute a{x). The 
difference between them is the value of the attribute v{x), e.g., 

{oil in the hearing, temperature, low) 

{oil in the hearing, temperature, normal) (15.48) 

{oil in the hearing, temperature, high). 
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The vector of the component statements contains proper values of prob- 
abilities interpreted as the degrees of belief. It is assumed that the collec- 
tion of statements may contain statements that are mutually exclusive and 
meet (15.46). The collection can be also exhaustive, which allows meet- 
ing (15.47). On the basis of these assumptions, one concludes that the 
sum of the values of these degrees is equal to 1.0 for each combination of 
statements. 

The branches connecting the nodes of the belief network are characterised 
by tables containing the values of conditional probabilities, estimated for all 
elements of the Cartesian product of the collections of statements assigned 
to both nodes. 

Conditional probabilities assigned to graph branches (forming belief net- 
works) can be also interpreted as a special representation of the cause-effect 
relationship. However, such an interpretation can be a reason for misunder- 
standings. An interesting property of belief networks is the possibility of 
transforming the network into an equivalent network characterised by a dif- 
ferent structure. Figure 15.5 includes an example of a network for which the 
joint probability before the transformation is 

P(A, P, C) = P{C I A)P{B I A)P(A), (15.49) 

and after the transformation can be written as 

P(A, P, C) = P{C I A)P(P, A) - P{C I A)P{A I P)P(P). (15.50) 

According to Bayes’ theorem (15. 44), the values of joint probability which 
are estimated on the basis of (15.49) and (15.50) are equal. It enables us to 
approve the equivalence between the networks shown in Fig. 15.5. 



© (b) © 

(a) (b) 



Fig. 15.5. Example of a transformation of the belief network: 
(a) before the transformation and (b) after the transformation 



The transformation of the network makes it possible to notice an impor- 
tant property of the network nodes. It is called statistical independence. It 
is stated that each set, known value of probability which is assigned to the 
analysed node makes the probabilities assigned to the offspring of this node 
statistically independent. An illustration of this property can be an example 
referring to Fig. 15.5. Estimating the probability P{A) for a parent, before 
the transformation, makes the probabilities of its offspring P(P) and P{C) 
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independent. It is clearly visible after the transformation, where the value of 
P{A) isolates P{C) from the influence of P{B)> 

Inference in belief networks consists in the estimation (negotiation) of 
probabilities assigned to consecutive nodes in order to estimate the equilib- 
rium state of the network. Attempts at a search for global solutions lead to 
NP-difficult problems. An effective approach is to search for local solutions 
which include consecutive nodes and their parents. A fundamental work which 
describes the ways of formulating and solving such tasks is (Pearl, 1988). In- 
teresting discussions of methods of searching for the equilibrium state of a 
network are included in (Jensen, 2002), (Cowell et a/., 1989) and (Borgelt 
and Kruse, 2002). Estimating local solutions should be preceded by possi- 
ble transformations of the network, which makes it possible to minimize the 
number of independent nodes. An interesting and difficult problem appears 
while the network is being constructed. It deals with the optimization of the 
sensitivity of the nodes. The sensitivity enables us to examine how sensitive 
the conclusions (distributions of output variables) are to minor changes of 
the parameters of the mode (sensitivity to parameters) , and of the evidence 
(sensitivity to evidence). 

The application of belief networks can be illustrated by a simple example. 
Let us consider a network which represents journal hearing and consists of 
three nodes, called radial clearance^ shaft vibration and oil temperature, with 
their respective values {large, normal, small}, [high, low}, [too high, normal, 
too low}. Statistical dependencies between these nodes are characterised by 
the tables of conditional probabilities 15.1. 



Table 15.1. Prior and conditional probabilities (an example) 



A: Radial clearance 



large 


normal 


small 


0.25 


0.60 


0.15 



B : Shaft vibration 



A 


high 


low 


large 


0.7 


0.30 


normal 


0.20 


0.80 


small 


0.25 


0.95 



C: Oil temperature 



A 


too 

high 


normal 


too 

low 


large 


0.1 


0.3 


0.6 


normal 


0.1 


0.8 


0.2 


small 


0.6 


0.3 


0.1 



The results of inference with the use of this network are shown in Fig. 15.6 
and Fig. 15.7. 

15.7.4. Possibilities of application 

Belief networks can be used as dynamic networks in the environments of 
blackboards. However, such applications require developing proper software. 
In spite of numerous possible solutions proposed, e.g., in (Neapolitan, 1990; 
Spegelhalter et al, 1993), such software is not easy to work out. 
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Fig. 15.6. Prior probabilities for all nodes and the a posteriori prob- 
ability of the nodes B and C resulting in causal direction from the 
given evidence represented by the node A (small radial clearance) 



Ready-made shell applications, which contain optimized algorithms of 
searching for the equilibrium state of a network, are an excellent proposition 
for potential designers of statistical networks. The only operations they are 
supposed to perform, before the application, is to define the structure of 
the network, the contents of the statements and the conditional probabilities 
assigned to the network nodes. Some examples of interesting shell programs 
are Netica (Norsys, Internet), MSBNx (Kadie et al, 2001), HUGIN (Hugin, 
Internet). 



15.8. Integration of the computer environment 

The requirements related to integration of the computer environment, in 
which we use software aiding technical diagnostics, are known and do not 
require any explanation. This integration can be considered from different 
points of view. Initially, the most important task was to ensure the possibil- 
ity of exchanging data between different software modules which co-operate 
in the range of one operational memory. The proposed approach of the unifi- 
cation of multi- accessible modules which meet the requirements of the COM 
{Component Object Model) made it possible to obtain high effectiveness of 
the software. It was extended to the specification CORE A {Common Ob- 
ject Request Broker Architecture)^ which enables us to create systems with 
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Fig. 15.7. A posteriori probability for the node A resulting in 
diagnostic direction from the evidence represented by the node B 
(low shaft vibration) and from the evidence represented by the nodes 
B and C (low radial clearance, too high oil temperature) 



remote control of procedure call, RPC {Remote Procedure Call). The COM 
and CORBA specifications can be applied provided they use proper libraries 
and the API {Application Programming Interface). These elements are made 
available as very similar to different software platforms. However, slight dif- 
ferences between them limit the possibility of moving the program codes 
between different environments. 

One of the most effective ways of the discussed integration is the ap- 
plication of unified databases, ensuring the possibility of exchanging data 
between several programs in a simple manner. However, that approach is not 
always enough. Additional requirements usually appear later, particularly 
when multi-layer software is applied. It seems that the sufficient completion, 
or even a substitute, in some cases, of the unified databases is the use of the 
XML {Extensible Markup Language) (W3C, Internet). 

15.8.1. Databases 

One can list a variety of publications dedicated to expert systems which do 
not discuss databases. It is often assumed that their form and maintenance 
are obvious and do not require any additional explanations. However, on the 
basis of the author’s experience, one can state that databases are important 
elements which co-operate with expert systems. They contain statements 
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describing everything that happened in reality. They can decide about the 
quality of the whole system. The database should be taken into account at 
the stage of the initial phase of constructing the system. Nowadays, rational 
and modern expert systems should be based on databases which can operate 
in the network and directly co-operate with the Internet and Intranet. It is 
particularly important in the case of systems designed for co-operation with 
several users. 

The structure of the database determines the way of ordering the data. 
Selecting this structure should be strictly connected with the possibility of 
limiting redundancy. At the same time, it should guarantee proper access 
to the data written in the database. As the basic types of databases one 
can enumerate relational, hierarchical and network structures. The relational 
structure is very simple, and the language which describes the operations 
performed in the database is also uncomplicated. This kind of database is 
most often applied. 

In numerous shell expert systems one can distinguish two data base cat- 
egories (Fig. 15.1): 

• the static database, containing general information, e.g., about the 
structure of the object and its state; 

• the dynamic database, containing the results of measurements and the 
user’s answers (inserted with the use of procedures controlling the dialogue) 
and intermediate solutions (estimated by inference procedures on the basis 
of the known static and dynamic data). 

This division is not sufficient for the application of technical diagnos- 
tics. For some time, the authors of software aiding technical diagnostics have 
been noticing inconveniences resulting from the lack of a universal system of 
databases that could include information about the analysed objects. This 
lack is the reason for: 

• the costs of elaborating software being very high; 

• the lack of the possibility of developing the software market concerning 
diagnostics; the software could be proposed by a variety of producers and 
research centres; 

• the lack of the possibility of spreading conveniently the results of the ob- 
servations and applications of synthetic databases which contain generalised 
results of examinations. 

The range of the collected data about the object is dependent on the 
requirements set by the inference module. Apart from data which makes it 
possible to unambiguously identify the analysed object, there are collected 
quantity data enabling us, for exemple, to estimate the characteristic frequen- 
cies of vibration appearing in consecutive phases of the object operation. In 
order to limit the size of the set of the collected data, the descriptions of the 
objects can be organised in hierarchical structures. Thus, the data which is 
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common for the determined groups of the objects is written at higher levels 
than the data valid for all subordinate elements. 

Apart from that, the history of events related to object maintenance is 
remembered. Particular requirements dealing with the range of the collected 
data are imposed by such objects in which the subassemblies can be replaced 
during stoppages and repairs. In this case a given subassembly X can operate 
within the object A up to the given time moment. Then it is dismounted in 
order to be repaired or regenerated. After that, it is mounted within another 
object B. The history of the objects A, B and X, collected separately, 
leads to data redundancy and limits the possibility of total inference about 
the influence of different changeable factors on the object. An interesting and 
practically significant approach to data collection (e.g., MIMOSA, Internet) 
is the distinction between two kinds of objects: segments and assets. 

The segment is an abstract object which appears as a part or subassem- 
bly in the block scheme or within technical documentation. Different require- 
ments are imposed on this object. They can be met by a variety of real 
objects. An example of segment description can be the set of data dealing 
with the tyre of the right front car wheel. This description is included in the 
user’s manual of the car. This kind of data describes the general type and 
size of the tyre. However, there is often no information about the producer, 
the model of the tyre and also the serial number of the tyre used in the given 
car. 

The asset is a real object appearing in a place which is determined by 
the segment corresponding to this asset. The identification of the asset is 
performed with the use of data collected especially for this selected unit, 
e.g., when dealing with the tyre of the right front car wheel, the description 
of the asset is possible using (at least) the type (model), serial number and 
name of the trademark. Catalogue data collected in the hierarchical database 
describing features common to several real objects which are the same models 
made by the same producers can be collected as data representing the element 
defined as the models which appears as superior to the asset. 

In the discussed example, the trademark and type of the tyre point to 
this model in the catalogue. 

Distinction between the segments and the assets results in the fact that 
some additional data which determine their connection is essential. These 
connections can, for exemple, inform about which asset appeared in the place 
pointed by the segment at the selected time moment. 

Research, whose goal is to approve the general assumptions concerning 
relational databases, dedicated to the discussed applications, has been con- 
ducted since 1994 with the participation of numerous teams. The project is 
known as the MIMOSA {Machinery Information Management Open Systems 
Alliance). 

The goal of the project is to elaborate a protocol of the exchange of data, 
related to the monitoring of the state of technical objects. The database ap- 
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plied is the relational one. A particular advantage of the MIMOSA project is 
that it also includes versions of the XML which contain libraries of definitions 
of types of documents, DTD. Nowadays, there is access to a set of general, 
basic recommendations and pattern examples of bases applied within differ- 
ent systems (MIMOSA, Internet). The recommendations are presented in the 
form of a few dozen tables defining selected fragments of the complex rela- 
tive database as well as instructions of the import and export of text data, 
which ensure the possibility of exchanging the data between different comput- 
er environments. The project is not related to any special system managing 
relative databases. A result of that is the possibility of moving data between 
different management systems. 

15.8.2. Multi-layer software 

Multi-layer programs are a very important part of the software structure. 
The increase in the their significance is caused by the trend towards the 
unification of the dialogue between the final user and the software, and the 
search for optimal strategies of software development. 

A general model of the software structure is shown in Fig. 15.8, where 
one distinguishes the layer of access to data, the layer of data processing, the 
layer of inference and the layer of the presentation of results. In the tradi- 




Fig. 15.8. Model of the software structure: (a) “traditional’ 
(b) two-layer, (c) three-layer, (d) multi-layer 
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tional structure (Fig. 15.8(a)), all layers appear within the software designed 
for the final user. This kind of software exemplifies the presentation layer. 
In the two-layer structure (Fig. 15.8(b)), the procedures of access to data 
(reading, writing, deleting, blocking, eliminating collisions, etc.) are separat- 
ed from the usable program and they create the outer layer. This layer is 
common to several programs. In the three-layer structure (Fig. 15.8(c)), the 
next distinguished layer is data processing. Determining the following outer 
layers depends on software designation. The goal of this approach is to guar- 
antee that the presentation software will have a simple and unified form. One 
may assume that the optimal presentation software, in the multi-layer struc- 
ture (Fig. 15.8(d)), should be an Internet browser. Such a solution enables us 
to take advantages of verified, typical solutions, concerning the co-operation 
of different software modules in a disperse environment and in consecutive 
layers. 



15.8.3. Special languages 

One of the most interesting models dealing with data gathering is a model 
which makes the assumption that all information about the analysed object 
can be represented in the form of documents. Problems concerning the nota- 
tion of document contents and problems related to the presentation of these 
contents are undertaken separately. 

Research on a language which would make it possible to describe any doc- 
ument in the universal way has been conducted for many years. In 1969 the 
SGML {Standard Generalised Markup Language) was elaborated (W3C, In- 
ternet). It allows storing documents independently of the software used. The 
language determines the set of specific rules. It is performed separately for 
each document. This set is called document definition, DTD {Document Type 
Definition). Because of its complicity, this language has met with moderate 
reception. 

In 1989 the HTML {Hypertext Markup Language) was worked out (W3C, 
Internet). It is derived indirectly from the SGML. This language evoked far 
greater interest. It was initially designed mainly for the description of the 
presentation of texts in Internet browsers, which is characterised by large 
capabilities of practical applications. The language contributed to an increase 
in the significance of the Internet. 

The HTML enables us to determine both the form of the text and image 
data presentation. Given fragments of the presented text or image data can 
be distinguished with the use of markups, for example, <TITLE> and </TI- 
TLE>, <H1> and </Hl>, <TABLE> and </TABLE>, as well as headings, 
tables, etc. These fragments can be associated with determined fragments of 
their presentations. However, the set of markups used in the HTML is lim- 
ited and there is no possibility of extending this range. Not all markups are 
interpreted in the same way by all Internet Browser. 
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Since the main feature of the HTML is the fact that it describes only 
the form of the presentation and does not contain any information about 
the structure of the presented document or the meaning of its elements, the 
language is often described as WYSIAYG (what you see is ^ you get). It is 
a paraphrase of the commonly known acronym WYSIWYG (what you see is 
what you get), related to some computer text editors. This description should 
not be treated as a criticism of the HTML. The language is not designed 
for purposes different than presentations. However, one should keep these 
limitations in mind. Because of them, in spite of numerous publications, the 
HTML is not the best tool of solving tasks related to the unification of access 
to distributed databases. 

In 1989, the Consortium W3C {World Wide Web Consortium) introduced 
a meta-language called the XML {Extensible Markup Language) (W3C, In- 
ternet). Similarly to the HTML, the XML is derived from the SGML. As op- 
posed to the HTML, which describes the presentation of the document, the 
XML describes the meaning and structure of the document. The language al- 
lows sorting, selecting and modifying data included within documents. These 
operations can be performed both on the user’s station and the server. The 
presentation of a document defined in the XML can be described with the use 
of the XSL {Extensible Style Sheet Language). The XML document contains 
data (text) and markups of the XML, which are similar to the markups of 
the HTML. The document consists of elements, which are hierarchically or- 
dered. Each element includes data placed between the obligatory, initial and 
final markups. Exemplary <markup>data</markup>. The elements of the 
document can be associated with attributes written in open markups. For 
example, <amplitude size=“mm/s”>2.14</amplitude>. These markups do 
not have any names. Names can be introduced only for elements (with the 
use of attributes). 

An interesting solution (derived from the SGML) was the introduction of 
document type definition, DTD. It is a description of the model of the con- 
tents of a selected class of XML documents. The DTD enables us to verify 
the correctness of the document written in the XML. The next step in the 
development of XML applications was the introduction of XML (W3C, In- 
ternet) schemes, which are similar to the DTD patterns of XML documents. 
In comparison to the DTD, their use is more comfortable. The schemes are 
written directly in the XML and do not require any knowledge of the complex 
syntax of the DTD. An additional property of the schemes is that they are 
written in the extensible XML. It implies that the software engineer is allowed 
to introduce supplementary elements when it is required (Schema, Internet) . 
One should stress that the described solutions mean that data written in the 
XML can represent the description of their structure. It opens very important 
possibilities of integrating computer systems. In the case of systems where 
proper knowledge representations and also acquiring and applying the knowl- 
edge specific for the determined domain play the key role, this approach is 
particularly important. 
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Some limitations of research related to the unification of databases and 
the elaboration of the DTD pattern were the global character of the names 
applied. The original idea of the W3C was the application of the possibilities 
of using several dictionaries, where the global character is characteristic only 
for the names of these dictionaries (W3C, Internet). This possibility was 
included in the XML schemata. 



15.9. System testing 

The main goal of testing expert systems is to ensure their quality. It is worth 
stressing that procedures of testing are in this case particularly difficult to 
elaborate. 

The literature devoted to the methods of software testing is very exten- 
sive. In the case of testing an expert system, the following operations are 
recommended: 

• tests of the knowledge base, which consists, e.g., in verifying the rules 
in order to identify contradictory, redundant or looped rules; 

• tests of the correctness of the operation of the knowledge base for all 
possible values of the input data; 

• tests of the modules of the expert system. 

In practice, the proposed approach can be related to simple systems, with a 
small number of input data. 

One can list many tools designed for testing expert systems (Finke et aL, 
1996). However, the literature on these methods does not provide exhaustive 
proposals related to such a documentation. There is a lack of a description 
of repeatable and adapted tests of knowledge bases as well as systems whose 
operation is based on numerous sets of rules. They transform so large sets of 
input values that their review is impossible. The need for proper recommen- 
dations is very often postulated, e.g., in (Avritzer et a/., 1996). 

It should be stressed that there is also a lack of a uniform opinion on 
whether testing the expert system should be performed as testing the black 
box (with unknown interior) or as testing a module (software) whose total 
specification is known. 

It seems that the representation of the database written in the form of 
(15.39) is a suitable suggestion for testing it through the introduction of 
intended redundancy (additional independent rules). This solution makes it 
possible to monitor continuously the vales of the norm ||^ || , which is a measure 
of contradictions appearing in the database. The plots of the values of the 
norm can show the quality of the knowledge base. 
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15.10. Summary 

In the present chapter, fundamental classes of expert systems applied in tech- 
nical diagnostics were discussed. It seems that their considerable importance 
is connected with a further development of systems that are based on belief 
networks. They can effectively aid solving tasks characteristic for technical 
diagnostics. These tasks can be interpreted as an exact and approximate 
classification. 

Selected problems related to the application of expert systems aiding tech- 
nical diagnostics were also discussed. To elaborate and put into operation such 
a system, one should take into account the following operations: 

• estimating a detailed range of the required tasks which should be un- 
dertaken by the system being put into operation; 

• selecting proper measurement devices, preparing and activating the 
maintenance procedures of these devices; 

• determining the structure of databases which take into account the local 
requirements of the selected object; 

• elaborating dictionaries of objects names, attributes and qualitative val- 
ues of attributes as well as elaborating procedures transforming quantitative 
values of signals into qualitative ones; 

• selecting the class of the system applied; it is based on the analysis of 
the possibility of realizing a system that is based on blackboards including 
approximate statements, as well as a system which is based on belief networks 
instead system based on the rule representation; 

• selecting a shell expert system or making a decision about its individual 
elaboration; 

• performing the process of knowledge acquisition for the system being 
elaborated; 

• putting into operation the final version of the system and verifying the 
performance of its co-operation with measurement devices; 

• determining the range of essential simulation experiments, performing 
this research and determining the updating of the knowledge base on the 
basis of the obtained results; 

• putting into operation the process of a systematic improvement of the 
knowledge base and the development of the explanation module; 

• carrying out staff training. 

An effective realization of the above tasks by one person or a small staff 
group is impossible. One should be aware of the fact that the elaboration of 
an expert system which aids the processes of diagnosing complex objects is 
an extremely difficult task. 
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Chapter 16 



SELECTED METHODS OF KNOWLEDGE 
ENGINEERING IN SYSTEMS DIAGNOSIS 



Antoni LIG^ZA* 



16.1. Introduction 

In the process of technical systems fault diagnosis man uses different meth- 
ods of knowledge representation and inference paradigms. The most common 
scenario of such a process consists in the detection of faulty behavior of the 
system, classification of that kind of behavior, search for and determination of 
causes of the observed misbehavior, i.e., the generation of potential diagnoses, 
verification of diagnoses and selection of the correct one, and, finally, the re- 
pair phase. There exist a number of approaches and diagnostic procedures 
having their origin in very different branches of science, such as mechanical en- 
gineering, electrical engineering, electronics, automatics, or computer science. 
In the diagnosis of complex dynamical systems an important role is played 
by approaches originating in automatics (Koscielny, 2001) and computer sci- 
ence (Frank and K5ppen-Seliger, 1995; Hamscher et a/., 1992; Johnson and 
Keravnou, 1985; Korbicz and Compel, 1993; Liebowitz, 1998), and they seem 
to be the most interesting ones. A fine comparative analysis of some selected 
approaches was presented in (Cordier et aZ., 2000a; 2000b). The present chap- 
ter is devoted to the presentation of some selected approaches originating in 
computer science. 

Consider the mathematical point of view of the diagnostic process. Taking 
such a viewpoint, the problem of building a diagnostic system, including the 
issues of the representation and acquisition of diagnostic knowledge as well 
as the implementation of a diagnostic reasoning engine, can be considered as 
one of searching for an inverse function or relation. In fact, such a problem is 
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inverse to the simulation task; some of its specific features are as follows: (i) 
one observes faulty behavior of the simulated system (and thus apart from 
knowledge about the correct behavior, information about the faulty behavior 
should also be accessible) and (ii) taking into account the observed state 
(output), the main goal is not the reconstruction of the input (control), but 
rather a search for the causes of the failure. 

Assume that one of n system components can become faulty, with each 
elementary fault having a binary character - such a component can either 
work correctly or not. Let D denote a set of potential elementary causes to 
be considered and let D — {di,d 2 , . . . dn}* Faulty behavior of the system can 
be stated (detected) through the observation of one or more symptoms of 
the failure. Assume that there are m such symptoms to be considered and 
their evaluation is also binary. Let M denote a set of such symptoms, where 
M = {mi, m 2 , . . . ,m^}. The detection of a failure consists in detecting the 
occurrence of at least one symptom rrii G M. In general, some subset C 
M of the symptoms can be observed in case of a failure. The goal of the 
diagnostic process is the generation of a diagnosis being a set C D 
such that, taking into account expert knowledge and/or the system model, 
it explains the observed misbehavior. 

In a general case, the result of the diagnostic process can consists of one 
or more potential diagnoses; these diagnoses - subsets of the set D - can be 
single-element sets (i.e., the so-called elementary diagnoses) or multi-element 
ones. For simplicity, in a number of practical approaches only single-element 
diagnoses are taken into account. In the case of complex, multi-element di- 
agnoses the discussion is frequently restricted to the so-called minimal diag- 
noses, i.e., subsets of D which explain the observed misbehavior in a satis- 
factory way and at the same time all their elements are necessary to justify 
the diagnosis. 

For the sake of general consideration, it can be assumed that there exists 
a causal dependency between elementary faults represented by the elements 
of D and failure symptoms represented by the elements of M. Hence, there 
exists some relation Rc (i.e., a causal relation), such that 

Rc C2^ X 2^, (16.1) 

i.e., any failure defined as a subset of D is assigned one or more sets of 
possible failure symptoms; in certain particular cases the failure - although it 
occurred - may be unobservable. Such approach, however, is indeterministic: 
a single failure may be assigned several different sets of symptoms of the 
observed misbehavior. Therefore it is frequently assumed that the causal 
dependency Rc is a functional one, i.e., 

Rc: 2^ — >• 2^. (16.2) 

In this approach any failure results in a unique and well-defined set of 
symptoms. In this case the task of building a diagnostic system consists in 
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finding the inverse function, i.e., the so-called diagnostic function /, where 
/ = Unfortunately, in many realistic systems the function Rc is not a 
one-to-one mapping, so there does not exist an inverse mapping in the form of 
a function. The simplest example of such a system is the set of n bulbs (e.g., 
one for a Christmas tree) connected in series. An elementary diagnosis di is 
equivalent to the z-th bulb being blown. However, the set of manifestations of 
the failure M = {tui} is a single-element set, where mi stands for the bulbs 
are not on. Even when the analysis is restricted to considering single-element, 
elementary diagnoses, there exist n potentially equivalent diagnoses; each of 
them has the same result - mi. If multi-element diagnoses are admitted, then 
there exist (2^ — 1) potential diagnoses. 

In practice, the development of a diagnostic system consists in finding the 
inverse relation R^^ and, more precisely, in searching for it during the diag- 
nostic process. In many practical diagnostic systems the diagnostic process 
is interactive, and additional tests and measurements can be undertaken in 
order to restrict the area of the search. In the case of the serially connected 
bulbs such an approach may consists in the examination of certain bulbs or 
rather their groups (an optimal strategy is the one of dividing the circuits 
into two equal parts) . Complex diagnostic systems use hierarchical strategies 
for identifying faulty subsystems, interactive diagnostic procedures with the 
use of supplementary tests and observations aimed at restricting the search 
area, accessible statistical data in order to establish the most probable diag- 
noses, and apply heuristic methods in diagnosis. One of the basic, frequently 
applied heuristics is considering only elementary diagnoses. Another, more 
advanced one consists in considering only minimal diagnoses. 



16.2. Review and taxonomy of knowledge engineering 
methods for diagnosis 

Knowledge engineering methods occupy an important position both in tech- 
nological systems diagnostics and in medical diagnostics (Johnson and Ker- 
avnou, 1985; Liebowitz, 1998; Tzafestas, 1989). They originate in research 
in the domain of Artificial Intelligence (AI), and, in particular, in investi- 
gations concerning knowledge representation methods and automated infer- 
ence. These methods are a good example of practical applications of AI tech- 
niques. They are mostly based on logical, graphical and rule-based knowledge 
representation and automated inference methods. A characteristic feature of 
knowledge engineering methods is that they use mostly a symbolic representa- 
tion of domain and expert knowledge and that they use automated inference 
paradigms for knowledge processing. They can also make use of numerical 
data and models (if accessible) as well as uncertain, incomplete, fuzzy or 
qualitative knowledge. A common denominator and core for all the methods 
is mathematical logic. 
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The key issue in knowledge engineering is knowledge representation and 
knowledge processing; some other typical activities include knowledge acqui- 
sition, coding and decoding, analysis and verification, and recently also the 
synthesis of knowledge, mostly through an inductive generation of rules from 
examples (Moczulski, 1997). Because of the specific character of knowledge 
engineering methods, originating mostly in symbolic methods of knowledge 
manipulation in AI, the taxonomy of such methods is different than that of 
methods developed in the automatic control area (Koscielny, 2001; Koscielny 
and Szczepaniak, 1997), and it constitutes its extension and completion. This 
extension is oriented towards taking into account specific aspects of knowl- 
edge engineering methods while the taxonomy takes into account both the 
applied tools and the philosophy of specific approaches. In particular, in the 
case of knowledge engineering methods some essential issues of diagnostic 
approaches are: 

• the type (source) and the way of specifying diagnostic knowledge, 

• the applied knowledge representation methods, 

• the applied inference methods, 

• the inference control mechanism. 

Diagnostic knowledge can be in fact of two different origins. First, it can 
be the so-called shallow knowledge^ having its source in observations and ex- 
perience. Such a kind of knowledge is also called expert knowledge when it is 
appropriately significant, and frequently its acquisition consists in interview- 
ing some domain experts. In this case the knowledge of the system model 
(the structure, principles of work, mathematical models) is not required. The 
specification of such knowledge may take the external form of a set of ob- 
servations of faults and diagnoses assigned to them, a training sequence or 
ready-to-use rules coming from an expert. Approaches based on the use of 
shallow knowledge are generally classified as expert methods. 

Secondly, knowledge may originate in the analysis of mathematical models 
(the structure, equations, constraints) of the diagnosed system; this knowl- 
edge is referred to as the so-called deep knowledge. When such knowledge is 
accessible, the diagnostic process can be performed with the use of the mod- 
el of the system being analyzed, i.e., the so-called model-based diagnosis is 
performed. Deep knowledge takes the form of a specific mathematical model 
(adapted for diagnostic purposes) and possibly some heuristic or statistical 
characteristics useful for direct diagnostic reasoning. Most frequently, the 
specification of deep knowledge includes the definition of the internal struc- 
ture and dependencies valid for the analyzed system in connection with the 
set of elements whose faults are subject to diagnostic activities, as well as the 
specification of the current state of the system (observations). Approaches 
based on the use of deep knowledge are classified as model-based approach- 
es. Obviously, a wide spectrum of intermediate cases composing both of the 
above approaches in appropriate proportions are also possible. 
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Knowledge representation methods include mostly symbolic ones, such as 
facts and inference rules, logic-based methods, trees and graphs, semantic 
networks, frames, scenarios and hybrid methods. Numerical data (if present) 
may be represented with the use of vectors, sequences, tables, etc. Mathemat- 
ical models (e.g., in the form of functional equations, differential equations, 
constraints) may also be used in modeling and failure diagnosis. 

Reasoning methods used in diagnostics include logical inference (deduc- 
tion, abduction, induction, consistency-based reasoning, non-monotonic rea- 
soning) as well as methods of knowledge processing in rule-based systems 
originating in logic (forward chaining, backward chaining, bidirectional infer- 
ence), pattern matching algorithms, search methods, case-based reasoning, 
and others. In the case of numerical data, various methods of training sys- 
tems, both parametric and structural ones, are also applied. 

The control of diagnostic inference is mainly aimed at enhancing efficiency, 
so that all the diagnoses (in the case of a complete search) or only the most 
probable ones (in the case of an incomplete search) are generated as fast as 
possible, so that the obtained diagnoses are ordered from the most likely ones 
to the ones that are most unlikely. The applied methods include the blind 
search, ordered search, heuristic search, the use of statistical information and 
methods, the use of qualitative probabilities, the use of supplementary tests in 
order to confirm or reject search alternatives as well as hierarchical strategies. 

The taxonomy of diagnostic approaches presented below is based on the 
knowledge engineering point of view and takes into account mainly the type 
and the way of diagnostic knowledge specification and the applied methods 
of knowledge representation. 

Expert methods 

1. Methods based on the use of numerical data: 

• pattern recognition methods in feature space, 

• classifiers using the technology of artificial neural networks, 

• simple rule-based classifiers, including fuzzy rule-based systems, 

• hybrid systems. 

2. Methods using symbolic data and knowledge (classical knowledge engi- 
neering methods, simple algebraic formalisms, graphic- and logic-based 
methods and those based on domain expert knowledge): 

• diagnostic tests, 

• fault dictionaries, 

• decision trees, 

• decision tables, 

• logic-based methods, mainly rule-based systems and expert systems, 

• case-based systems. 
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Model-based methods 

1. Consistency-based methods: 

• consistency-based reasoning using purely logical models (Reiter’s 
theory (1987)), 

• consistency-based reasoning using mathematical, causal and qual- 
itative models. 

2. Causal methods: 

• diagnostic graphs and relations, 

• fault trees, 

• causal graphs, 

• logical abductive reasoning, 

• logical causal graphs. 

16.3. Modelling causal relationships in diagnosis 

Modeling causal relationships plays a very important role in diagnosis. It 
makes it possible to point to the dependencies between the potential faults 
of the elements of the system under consideration and its observed behavior 
resulting from the faults. Knowledge about causal dependencies allows effi- 
cient diagnostic reasoning based on a direct use of a causal model or some 
rules generated on the basis of that model. 

Let d denote any fault of some element of the diagnosed system. In the 
most simple case it is assumed that the fault can either occur or not; from 
a logical point of view d may be considered to denote a propositional logic 
atomic formula. For the purpose of diagnosis, d will be referred to as an 
elementary diagnosis and as a logical formula it will be assigned a logical 
value true (if the fault occurs) or false (in case the fault is not observed). 

Analogously, let m denote a visible result of some fault d\ m can be 
observed in a direct way or can be detected with the use of appropriate tests or 
measurements. In the most simple case the manifestation m may be either 
observed or not, so, as before, from a logical point of view, m can be consid- 
ered to denote an atomic formula of propositional logic which can be assigned 
a logical value true or false. For the purpose of diagnosis, m will be referred 
to as a diagnostic signal or a manifestation^ or just a symptom of a failure. 

If there exists a causal relationship between d and m, it means that d is 
a cause of m and m is an effect of d. Let tn denote the time instant at which 
some symptom n occurred. For the existence of a causal relationship between 
the symptoms d and m it is necessary for the following conditions to be valid: 

% d\= m, i.e., m is a logical consequence of d, 

• td <tm, i-c., a cause precedes its result, 

• there exists a flow of a physical signal from the symptom d to the 
symptom m. 
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The first condition - the one of the logical consequence - means that whenev- 
er d takes the logical value true, m must also take the logical value true. So, 
this condition means that the existence of a causal relationship requires also 
the existence of a logical consequence^ ; this allows the application of logical 
inference models to the simulation of the system behavior as well as to reason- 
ing about possible causes of the failure. The condition of precedence in time 
means that the cause must occur before its result, and that the result occurs 
after the occurrence of its cause. This implies some obvious consequences for 
the modeling of the behavior of dynamic systems in case of a failure. The last 
condition means that there must exist a way of transferring the dependency 
(a signal channel facilitating the fiow of the physical signal) ; the lack of such 
a connection indicates that two symptoms are independent, i.e., there is no 
cause-effect relationship between them. A more detailed analysis of theoreti- 
cal foundations of the causal relationship phenomenon from the point of view 
of diagnosis can be found in (Fuster-Parra, 1996; Fuster-Parra and Ligqza, 
1996a; 2000). 

The presented model of a causal relationship is in fact the simplest kind 
of the so-called strong causal relationship; by relaxing the condition of the 
logical implication a potential relationship can be obtained (in such a case 
the occurrence of d may, but not necessarily, mean the occurrence of m), 
including a causal relationship of a probabilistic nature (characterized by 
some quantitative or qualitative probability). For simplicity, such extensions 
are not considered here. Another extension may consist in including the causal 
relationship between several cause symptoms and several result symptoms 
described with some functional dependencies; as this case is important for 
technical diagnosis it will be considered here in brief. 

Let V denote some set of symptoms, V = {vi,V 2 , . . . ,Vk}; the discussion 
is restricted to logical symptoms taking the value true or false. In some cases it 
may be observed that there exists a causal relationship between the symptoms 
of V constituting a common cause for some symptom v and this symptom. 
In particular, the following two cases are of special interest: 





UlVU2V- 


■■\/ Vk \=v, 


(16.3) 


and 


?;i A U2 A • ■ 


■ • A Ufc \= V. 


(16.4) 



In the first case, the occurrence of at least one symptom from V results 
in the occurrence of v; it is said that there exists a disjunctive relationship 
and the symptom v is of the OR type. By using an arrow to represent a 
causal relationship, the dependency of the disjunctive type will be denoted 
by Vi\v 2 \ • • - \vk — > In the other case it is said that the relationship is a 
conjunctive one - for the occurrence of v it is necessary for all the symptoms 

^ The existence of a logical consequence of the form d m does not mean that there 
exists also a causal relationship; for example, the occurrence of d and m may be 
observed as some independent results of some other, external common cause. 
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of V to occur; it is said that the symptom v is of the AND type. The 
conjunctive relationship is denoted by [t^i, ^> 2 ? • • • ? ^k] — > 

Furthermore, in some cases it may happen that the occurrence of some 
symptom will make another symptom disappear and vice versa; in such a 
case it is said that the relationship is of the NOT type, i.e., 

u \=v and u[= v. (16.5) 

This kind of a causal relationship is denoted by u — > v. 

The presented causal relationship applies to symptoms having the char- 
acter of propositional logic variables, i.e., to formulae taking the value true or 
false. Such symptoms, apart from denoting the occurrence of a discrete event 
(e.g., “tank overflow”, “signal is on”, etc.), may also show that certain contin- 
uous variables take some predeflned values or achieve certain levels, i.e., de 
facto they can encode some formulas of the form X = w or X e W, where 
X is some process variable and w is its value, and W is some set (interval) 
of values. In such a case the qualitative reasoning and qualitative modeling 
of the causal relationship at the level of propositional logic only may turn 
out to be insufficient. A more general notation for the representation of the 
causal relationship when the values of certain variables influence the values 
taken by another variable may take the following form: 

vi,V2,...,Vk — >i,v, (16.6) 

or the form of an equation: 

ip{vi,V2,. . . ,Vk) = V. (16.7) 

Note that in this case it is important that the variables t’l, U 2 , . . . , t’A; 
influence the variable u, and the quantitative (or qualitative) characteristics 
of this influence are expressed by means of an appropriate equation. In prac- 
tice, such characteristics can be expressed using a look-up table specifying 
the values of v for different combinations of the values of the input variables. 



16.4. Consistency-based diagnostic reasoning 

Consistency-based diagnostic reasoning is a relatively new diagnostic infer- 
ence paradigm which is based on a formal theory presented in the paper by 
Reiter (1987). The main idea of this paradigm consists in the comparison 
of the observed system behavior and the one which can be predicted with 
the use of knowledge about system model. On the one hand, such a kind of 
reasoning does not require expert knowledge, long-term data acquisition or 
experience, or a training stage of the diagnostic system. On the other hand, 
it requires knowledge about the system model allowing the prediction of its 
normal correct behavior. More precisely, what is called for is the model of the 
correct behavior of the system, i.e., a model which can be used to simulate 
the normal work of the system when there are no faults. 
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Fig. 16.1. Idea of consistency-based diagnostic 
reasoning based on the system model 



The idea of such a diagnostic approach is presented in Fig. 16.1. The real 
system and its model process the same input signals In. The output of the 
system Out is compared to the expected output Exp generated with the 
use of the model. The difference of these signals, the so-called residuum i?, 
is directed to the diagnostic system DIAG. A residuum equal to zero (with 
some predefined accuracy) means that the currently observed behavior does 
not differ from the expected one obtained with the use of the model; if this 
is the case, it may be assumed that the system works correctly. 

When some significant difference of the current behavior of the system 
from the one predicted with the use of the model can be observed, it must be 
stated that the observed behavior is inconsistent with the model. The detec- 
tion of such behavior (or, actually, misbehavior) implies that a fault occurs 
(on the assumption that the model is correct and appropriately accurate), 
i.e. fault detection takes place. In order to determine potential diagnoses, ap- 
propriate reasoning allowing for some modifications of the assumptions about 
the model must be carried out (in the figure it is shown by means of the arrow 
reflecting the “tuning” of the model); when it is possible to relax the assump- 
tion about the correct work of the system components in such a way that 
the predicted behavior is consistent with the observed one, then the modified 
model defines which of its elements may have become faulty. In this way a 
potential diagnosis D can be obtained (or a set of alternative diagnoses). In 
this case diagnoses are represented by the sets of system components which 
are faulty, and such that assuming them faulty explains in a satisfactory way 
the observed misbehavior by regaining the consistency of the observed output 
with the output of the modified model. 



16.4.1. Introduction to logic and consistency-based reasoning 

In the diagnostic approach based on the analysis of inconsistency very fre- 
quently the logical model of system behavior is used. Such a model constitutes 
in fact a theory describing the correct system behavior supplemented with 
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assumptions about the correct work of all its elements. Models encountered 
most frequently are based on the use of logic for specifying the complete and 
correct system behavior. Since diagnosed systems are often combinatorial 
logical circuits (a typical example is the full adder), as the tool for speci- 
fying the model one applies first order predicate calculus (Genesereth and 
Nilsson, 1987). 

The formulae of predicate calculus are built of terms, relational symbols 
(predicates), logical connectors, quantifiers and some auxiliary symbols, such 
as parentheses and the comma. The terms are used to represent any objects, 
even those with a complex internal structure. A term is: 

• any constant, e.g., a, b, c, etc., 

• any variable, e.g., X, Y, Z, etc. and 

• if / is an n-place functional symbol and ^ 2 , . . . , are terms, then 
also any expression of the form f(ti,t 2 , • • . , t^) is a term. 

Nothing more is a term. 

By defining the relations which hold between objects denoted by terms 
one can build the so-called atomic formulae (atoms or facts). If p is an n- 
place relational symbol and , ^ 2 , • • • ? are terms, then any expression of the 
form p(ti,t 2 , . . . , ^n) is an atomic formula. Atomic formulae constitute some 
simple statements which can be interpreted as follows: “n-place relation p 
holds for objects , ^ 2 , • • • , ^n”- 

More complex formulae are built from atomic ones through the use of 
logical connectives; some standard symbols denote conjunction (A), disjunc- 
tion (V), negation (-•) and implication (=>) as well as equivalence (<=^). When 
atomic formulae contain certain variables, then the variables should appear 
within the scope of some quantifier, i.e., the universal quantifier (V) or the 
existential quantifier (3). 

Logical formulae are assigned an interpretation (truth value), i.e., the 
meaning of symbols and relations is defined. After assigning the interpre- 
tation, the logical value of the formula can be determined. The formula ^ 
which is true under any interpretation is called a tautology] one then writes 
1= which means that the formula ^ is always satisfied (under any inter- 
pretation). The symbol |= denotes the relation of a logical consequence. An 
expression like ^ |= # means that for any interpretation for which formula 
^ is satisfied the formula $ must also be satisfied. A formula which is not 
satisfied under any interpretation is called an unsatisfiable formula; in other 
words - always faulty or inconsistent. In a general case checking for inconsis- 
tency may be quite a complex problem and may require a complex inference 
mechanism. In consistency-based diagnosis the crucial activity consists in the 
detection of the inconsistency of the formula defining the model of the sys- 
tem connected with the observations of the current output of the analyzed 
system. 
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16.4.2. System model, system components and observations 

Let us consider an example of a simple logic circuit, i.e., the so-called single- 
bit full adder. The circuit is composed of two XOR gates {exclusive- or) de- 
noted by XI and X2, two AND gates denoted by A1 i A2, and one OR 
gate. The inputs of the adder are inl{Xl) and m2(X2), fed with the signals 
to be added, and inl{A2), fed with the bit of carry. The outputs of the adder 
are out{X2), providing the result of summation, and out{01), providing the 
bit of carry. The circuit diagram is presented in Fig. 16.2. 




Fig. 16.2. Circuit diagram of the full adder 



Note that the analyzed system is composed of five principal elements 
{components, COMP or COMPONENTS), i.e., {A1,A2,X1,X2,0\]\ its 
correct behavior is described with a theory composed of the following for- 
mulae of predicate calculus (Reiter, 1987): 

and(0,0)=0, and{0,l) = 0, and{l,0)=0, and{l,l) = l, 
or(0,0)=0, or(0, 1) = 1, or(l,0)=:l, or(l,l) = l, 
a;or(0,0) = 0, xor{0,l) = l, xor{l,0) = l, a;or(l,l) = 0, 

ANDG{X) A ~^AB{X) ^ out{X) = and{inl{X), in2{X)), 

XORG{X) A ^AB{X) ^ out{X) - xor{inl{X), in2{X)), 

ORG{X) A ~^AB{X) => out{X) = or{inl{X), in2{X)), 

ANDG{Al), ANDG{A2), XORG{Xl), XORG{X2), 0RG{01), 

out{Xl) = in2{A2), 

out{Xl) = inl{X2), 

out{A2) — ml(Ol), 

ml(A2) = m2(X2), 

ml(Xl) = ml(Al), 

in2{Xl) — in2{Al), 
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out(Al) = m2(01), 

inl{X) = 0 V inl{X) = 1, in2{X) = 0 V in2{X) — 1, out{X) = OV 
out\x) = 1. 

In order to form a complete theory, the description should be completed 
with axioms concerning equality and those of the Boolean algebra. The above 
formulae constitute a model of the correct behavior of the system, i.e., the 
so-called System Description, (SD). The first three lines define the logical 
function of conjunction, disjunction and exclusive-or. The next three lines 
contain the description of the work of the AND, XOR and OR gates; the 
predicate AB (or, more precisely, its negation) allows defining an assumption 
about the correct work of a certain gate. The next lines define the types of 
component elements and connections between them. The last line defines 
constraints imposed on the values of the input and output signals. 

The description is supplemented with the specification of the observed 
behavior of the system, i.e.. Observations (OBS). In the case of the analyzed 
example the observations are as follows: 

ml(Xl) 1, in2{Xl) = 0, inl{A2) ^ 1, out{X2) = 1, out{Ol) = 0. 

Note that, taking into account the current observations, the theory de- 
scribing the system model {SD) becomes inconsistent with them - both of 
the observed outputs are different from their predicted values. The only rea- 
sonable explanation is that at least one of the system components became 
faulty; Reiter’s theory allows searching for and generating all sets of ‘suspect’ 
components, i.e., such that assuming them faulty explains the misbehavior 
of the system. Such sets of components form potential diagnoses. They are 
formed by retracting some of the assumptions about the correct work of sys- 
tem components. Note that even in the case of this relatively simple system 
its complete model specified with the use of first order logic is quite extensive, 
and the analysis of this model appears to be a non-trivial task. 

Let us consider another system, frequently discussed as a diagnostic ex- 
ample in the literature (Reiter, 1987). It is a system composed of three mul- 
tipliers and two adders; here these components operate on integer numbers. 
A schematic diagram of the system is presented in Fig. 16.3. 

The set of the components of the system under consideration is now given 
by COMP = {ml, m2, m3, al,a2}. The elements ml, m2 and m3 perform 
multiplication while elements al and a2 perform addition. 

A somewhat simplified model of the system specifying its behavior can 
be stated as follows: 

ADD(x) A -iAB(a;) => Output{x) — Inputl{x) -h Input2{x), 

M\JLT{x) A -iAB(a:) Output{x) — Inputl{x) * Input2{x), 

ADD(al), ADD(a2), MULT(ml), MULT(m2), MULT(m3), 
Output(ml) = Inputl(al), Output(m2) = Input2(al), 

Output(m2) = Inputl(a2), Output(m3) = Input2(a2), 
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Fig. 16.3. Schematic diagram of the arithmetic system 



Input2(ml) = Inputl(m3), X — C ^ Y = B ^ D, Z = C ^ E, 

F = X + Y, G = Y + Z. 

A more complete model should contain also the definitions of the opera- 
tions of multiplication and addition and their properties as well as the prop- 
erties of the equality relation; however, for the presentation of this example 
these details are not necessary. 

The currently observed behavior of the system is specified by the values 
of the input and output signals OBS — {A = 3, B = 2,C 2, D — 3, E = 

3,F = 10,G = 12}. Even a superficial analysis shows that the system must 
be faulty: the value of the signal F == 12 predicted with the use of the model 
is significantly different from the observed value F = 10. Hence, also in this 
case the assumptions about the correct work of system components and the 
theory describing its behavior are inconsistent with the observations. 

As another example let us consider the model of a widely used dynam- 
ic system composed of three interconnected tanks (Koscielny, 2001; Gorny, 
2001). A schematic diagram of the system is presented in Fig. 16.4. 




Fig. 16.4. Three-tank system 
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The components of the system selected for diagnostic purposes are spec- 
ified by COMPONENTS = {kl, /cl2, A:23, kS, zl, z2, 2:3}; they are the chan- 
nels (responsible for the flow) and the tanks (responsible for the volume of 
the liquid). The signals to be observed are Li, L2 and L3, and they describe 
the level of the liquid in the consecutive tanks and the signals controlling the 
input valve U. The model of the system is specified by means of the following 
set of differential equations: 



fiU) = F, 


(16.8) 


II 

1 

to 


(16.9) 




(16.10) 


1 

CO 

II 

CO 1 


(16.11) 



where Fij — aij Cij ^J2g{Li — Lj)^ F3 = a^C^y/2gLz^ Ai denote the cross- 
sectional areas of the tanks for i = 1,2,3, and C3 denote the cross- 

sectional areas of the channels connecting the tanks for ij — 12,23. Note 
that in this system even a superfiuous analysis becomes a non-trivial task; 
this is a consequence of the fact that here the analysed system is a highly 
interconnected dynamic one, it is described with non-linear equations and 
there exists strong feedback in it. 

16.4.3. Conflict sets 

The idea of a conflict set (or just conflict for simplicity) is of key importance 
for the theory of consistency-based diagnostic reasoning with the use of the 
system model. A conflict set is any subset of distinguished system elements, 
i.e., COMPONENTS, such that all items belonging to such a set cannot be 
assumed to work correctly (i.e., at least one of them must be faulty) - it is 
the assumption about their correct work that leads to inconsistency. 

Assume we consider a system specified by {SD, COMPONENTS) , where 
SD is the theory describing the work of the system (i.e.. System Description) 
and where COMPONENTS — {ci, C2 , . . . , c^} is the set of distinguished 
system elements. Any of these elements can become faulty and the output 
of the diagnostic procedure is restricted to be a subset of the elements of 
COMPONENTS. 

In diagnostic reasoning it is assumed that the correct behavior of the 
system is fully described by the theory of SD. Assumptions about the correct 
work of system components take the form 



^AB{ci) A —iAB{c2) A • • • A —iAB(cn)- 



(16.12) 




16. Selected methods of knowledge engineering in systems diagnosis 



647 



Hence, assuming that the observed behavior is described with the formulae 
of the set OBS and when there are no faults, the set given by 



SD U {^AB{ci), ^AB{c2), . . . , -^AB{cn)] U OBS (16.13) 



should be consistent. In the case of a failure, however, the set of formu- 
lae (16.13) turns out to be inconsistent. In order to regain consistency one 
should partially withdraw the assumption about the correct work of system 
components of the form ~^AB{ci). Such an approach leads to one or several 
sets of the form {c^, c^, . . . , C COMPONENTS of components such that 
at least one of them must have become faulty. From a logical point of view, 
the assumptions about such a conflict set are equivalent to stating that the 
formula 



AB{c^)w AB{c^)y AB{c'^) (16.14) 

is true. The formula (16.14) is true if AB{c^) holds for at least one i € {1, 2, 

Prom the analysis of the first example - the one of the full adder - and 
under the assumed observations, the following sets are conflicts (Reiter, 1987): 
{XI, X2}, {X1,A2,01}. 

Since the output signal out{X2) == 1 is faulty, at least one of the elements 
taking part in its generation must be faulty. There are two such elements, i.e., 
XI and X2, which leads to the first conflict set. Moreover, also out{01) = 0 
is incorrect. This signal is generated with the use of the components XI, Al, 
A2 and 01. But if three of them, i.e., XI A2 and 01, were correct, then 
the output of 01 would be 1; this leads to the second conflict. The set 
{XI, ^1, A2, 01} is not a conflict set since A1 may be working correctly. 

In the case of the example of the arithmetic system, under the assumed 
values of the input and output signals the conflict sets are {al,ml,m2}, 
{al, u2, ml, m3}. 

Since the value of the output signal F = 10 is faulty (inconsistent with 
the predicted value 12), then the elements generating the signal cannot be 
all working correctly; this leads to the first conflict of the form {al, ml, m2}. 
Further, note that F can be also predicted in another way. If m3 i a2 were 
correct, then it could be calculated that Y = 6, and assuming that ml and 
al are correct we have X = 6 and F = 12, which leads to the second conflict 
of the form {al, a2, ml, m3}. 

In the case of the three-tank system, the possibility of conflict generation 
depends on the observed symptoms. For example, if an incorrect level of 
the liquid in the first tank were observed, then the conflict set would be 
{kl, zl,kl2}; this is so since if all the elements were correct, the correct level 
would be preserved. 
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16.4.4. Theory of consistency-based diagnostic reasoning 
(Reiter’s theory) 

The theory of diagnostic reasoning using a system model and based on the 
analysis of the inconsistency of the observed system behavior with the one 
predicted with the use of the system model was described by Reiter (1987). 
Below the basic ideas of this theory are presented in brief. 

The basic definition in this theory is the definition of a system. 

Definition 16 . 1 . A system is a pair {SD, COMPONENTS) where: 

1. SD is a set of first-order predicate calculus formulae defining the system, 
i.e., the System Description^ 

2. COMPONENTS is a set of constants representing distinguished ele- 
ments of the system. 

Some examples of such systems have already been presented in this 
chapter. The model of the system SD describes its correct behavior. The 
distinguished elements appear in the model (5D), and they are the only 
elements which are considered to have become faulty - the potential di- 
agnoses will be built from these elements only. In order to do this, the 
system model is completed with formulae of the form ~^AB{ci), which 
means that the component Ci works correctly; for any element of the set 
COMPONENTS — {ci,C 2 , . . . ,Cn} the formula ^AB{ci) may appear just 
once or many times. 

The current behavior of the system is assumed to be observed and the 
values of certain variables can be measured. The representation of these ob- 
servations can be formalized in the form of a set of first-order logic formulae; 
let us denote this set by OBS. 

An assumption of the form {-iAB(ci), . . . , -^AB{cn)} means in fact that 
every component of the analyzed system works correctly. Hence, a set of the 
form 

5D U {->AJ5(ci), . . . , -iA.B(cn)} (16.15) 

represents the correct behavior of the system, i.e., behavior which can be 
observed on the assumption that no components are faulty. When at least 
one of the components Ci G COMPONENTS becomes faulty, the set of 
formulae of the form 

SD U {-iAR(ci), • • • 5 “'AB(cn)} U OBS (16.16) 

will become inconsistent. The diagnostic process consists in searching for 
components which may have become faulty. 

Intuitively, a diagnosis in Reiter’s theory is a hypothesis stating that some 
set of system elements being a subset of COMPONENTS became faulty. Mak- 
ing such an assumption must regain the consistency of the observed system 
behavior with the one predicted with the use of the model. For simplicity, 
only minimal diagnoses will be considered explicitly. 
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Definition 16.2. A diagnosis for the system with observations specified by 
{SD, COMPONENTS, OBS) is any minimal set A C COMPONENTS, such 
that the set 

SDyjOBSo{AB{c) I c G A}u{-nA5(c) | c G COMPONENTS -/S.) (16.17) 
is consistent. 

Roughly speaking, one may say that a diagnosis for some system failure 
which results in the observed misbehavior is any minimal set composed of 
system components, such that assuming all of them to be faulty and assuming 
that all the other elements work correctly is satisfactory for regaining the 
consistency of the observed behavior with the system model. 

16.4.5. Generation of conflict sets and diagnoses - 
a constructive approach 

A direct search for diagnoses in the form of minimal sets of components 
sufficient for regaining the consistency between the observed and predict- 
ed behavior based on the analysis of the set of formulae given by (16.17) 
would be a tedious task from the computational point of view. In the case of 
multiple faults one would have to search for all single-element subsets, then 
two-element subsets of COMPONENTS, etc., and each time such a subset 
should be verified to see if it constitutes a diagnosis; some simplifications 
may consist in the elimination of any superset of a diagnosis found earlier. 
In Reiter’s theory further improvements are proposed. 

First, let us introduce a formal definition of a conflict set (conflict). 

Definition 16.3. A conflict set for (SD, COMPONENTS, OBS) is any set 
{ci, . . . , Cfc} C COMPONENTS, such that 

SD U OBS U {-^AB(ci), . . . , -^AB(ck)] (16.18) 

is inconsistent. 

A conflict set is minimal if none of its proper subsets is a conflict set. 
For intuition, a conflict set (under given observations and system model) is a 
set of components such that at least one of its elements must be faulty. Any 
conflict set represents in fact a disjunction of potential faults. Note that if 
the analysis is restricted to minimal conflicts, then removing a single element 
from such a conflict set makes this set no longer a conflict. In other words, 
the system regains consistency. 

Now, let us define an important concept, i.e., a hitting set. 

Definition 16.4. Let C be any family of sets. A hitting set for C is any 
set H C U^gc n 5 7^ 0 for any set 5 G (7. 

A hitting set is minimal if and only if none of its proper subsets is a hitting 
set for C. 
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For intuition, a hitting set is any set having a non-empty intersection with 
any conflict set; it is minimal if removing from it any single element violates 
the requirement of the non-empty intersection with at least one conflict set. 

Having deflned the idea of a conflict set and a hitting set we can present 
the basic theorem of Reiter’s theory: 

Theorem 16.1. A C COMPONENTS is a diagnosis for (SD, 

COMPONENTS , OBS) if and only if A is a minimal hitting set 

for the family of conflict sets for (SD, COMPONENTS, OBS). 

Since any superset of a conflict set for {SD, COMPONENTS , OBS) is 
also a conflict set, it can be shown that iJ is a minimal hitting set for {SD, 
COMPONENTS , OBS) if and only if iiT is a minimal hitting set for all 
minimal conflict sets deflned for {SD, COMPONENTS , OBS). This obser- 
vation (proved in (Gorny, 2001)) leads to the following theorem, which is a 
fundamental result of Reiter’s theory: 

Remark 16.1. A C COMPONENTS is a diagnosis for {SD, 

COMPONENTS, OBS) if and only if A is a minimal hitting set for 
the family of minimal conflict sets for {SD, COMPONENTS, OBS). 

To summarize, the role of conflict sets in Reiter’s theory is providing spec- 
iflcations of components, such that for each conflict set at least one element 
must be faulty. By restricting the analysis to minimal conflicts one is sure 
that removing any single element from such a set leads to the elimination of 
the conflict. Hence, the union of all such elements (i.e., a hitting set) allows 
regaining global consistency; it constitutes then a (potential) diagnosis. Of 
course, for the analyzed system failure described by {SD, COMPONENTS, 
OBS) there can exist many diagnoses explaining the observed misbehavior. 

For example, for the previously analyzed full adder two minimal conflicts 
were deflned: {XI, X2} and {XI, A2, 01}. It can be easily seen that in such 
a case there exist three potential diagnoses: Di — {XI}, D 2 — {X2,A2} 
and O3 {X2,01}. 

In the case of the arithmetic system the following conflicts were found: 
{al,ml,m2}, {al, a2, ml, m3}. On their basis all the potential diagnoses 
can be easily found, i.e., Di = {ol}, D 2 — {ml}, O3 == {o2,m2}, 
O4 = {m2, m3}. Let us note that only minimal diagnoses are found (i.e., if 
some set is a diagnosis, then any of its supersets will not be generated as a di- 
agnosis), and that Reiter’s theory allows the generation of both single-element 
diagnoses (single faults) as well as multi-element ones (multiple faults). 

16.4.6. Search for conflicts; potential conflict structures 

In a general case, the search for conflicts is not an easy task. In the original 
work by Reiter (1987) no efficient method for conflict generation was given. 




16. Selected methods of knowledge engineering in systems diagnosis 



651 



To make things worse, in the general case it is necessary to use an automated 
theorem prover for proving the inconsistency of the set SDU OBSij{-^AB{c) | 
c e COMPONENTS} in order to find all refutations of it; for any such 
case the instances of predicates AB{-) used in refutation should be collected 
because they form the conflict sets. Of course, the applied theorem prover 
should be consistent and complete. In conclusion, in the general case the task 
of finding all minimal conflicts is difficult to accomplish and computationally 
complex. 

A potential conflict structure is a subgraph of the causal graph sufficient 
for conflict generation; this idea was first introduced in (Lig^za, 1996); then in 
(Corny, 2001; Corny and Lig^za, 2000; 2001; Lig^za and Corny, 1999; 2000), 
an attempt at defining an approach to an automated search of conflict sets for 
a wide class of dynamic systems which is based on the use of a causal graph 
representing the flow of signals in the analyzed system was undertaken. The 
use of such a graph simplifies the procedure of conflict generation and allows 
performing a relatively efficient search for all potential minimal conflicts. 
Next, as a result of the computational verification of potential conflicts those 
which are not real ones are eliminated. 

Let us introduce the definition of a causal graph: 

Definition 16.5. By a causal graph we shall understand a set of nodes rep- 
resenting system variables and a set of vertices describing mutual influences 
found among the variables. The vertices of the graph are assigned equations 
describing the influences in a quantitative way, and the variables are assigned 
some domains. 

Let us introduce the following notation: 

• A, B, C, ... - measurable variables, 

• [^] ~ unmeasurable variables, 

• X* - a conflicting variable, i.e., one taking a value inconsistent with 
model-based prediction, 

and let ( — >) denote the existence of a causal influence between two variables. 
Any such influence is assigned an expression of the form 

i = ([Xi,X 2 , . . . , Xjfc],/,y, [ci,C 2 , . . . ,Cjfc,Cy]), (16.19) 

where Xi, X 2 , . . . , are input variables, / is a function, Y is an output 
variable and ci , C 2 , . . . , c/t are system components responsible for the cor- 
rect work of subsystems generating the output values, and where cy is the 
component responsible for the value of the output variable Y. 

For example, a causal graph for the previously analyzed arithmetic unit 
has the structure as in Fig. 16.5. Recall that the variable F took an incorrect 
value; according to the accepted notation, in Fig. 16.5 it is marked with 
an asterisk. 
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Fig. 16.5. Causal graph for the arithmetic unit 



The core idea of using a causal graph for the search for conflicts is based 
on the following observations and assumptions: 

• the existence of all conflicts is indicated by the misbehavior of some 
variables (behavior different from the predicted one), 

• in order to state that a conflict exists the current value of it (observed 
or measured) must be different from the one predicted with the use of the 
model, 

• the conflict set will be composed of components responsible for the 
correct value of the misbehaving variable. 

So, when the causal graph for the analyzed system is defined, the detection 
of all conflicts requires the detection of all misbehaving variables, and then the 
search trough the graph in order to And all the sets of components responsible 
for the observed misbehavior. 

It seems helpful for the discussion to introduce the idea of a Potential 
Conflict Structure, or PCS for short (Lig^za, 1996; Gorny, 2001). 

Definition 16 . 6 . A PCS defined for the variable X on m variables is any 
subgraph of the causal graph, such that 

• it contains exactly m variables (including X), 

• the values of all variables are measured or calculated (they are well 
defined) , 

• the value of the variable X is double defined (e.g., measured and cal- 
culated with the use of the values of other variables), 

• in the PCS considered all the values of m variables are necessary for 
X in order to be double defined. 

A structure defined as above allows potential conflict generation. Some 
examples of a PCS for measured variables F, G,iH are shown in Fig. 16.6. 
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F G H 





Potential conflict 
{cl, c2, c3, c4} 
(cl, c2, c3, c5} 
(cl, c2, c3, c61 
(c4, c5} 

(c5, c6} 

(c4, c6} 



Fig. 16.6. Example of a PCS 



If calculations are reversible, i.e., an input variable can be calculated 
from the output and other inputs, then the PCS shown in Fig. 16.6 is also a 
conflict structure for the unmeasured variable X, and even for the variables 
A, C. A potential conflict becomes a real one if two values of the same 
variable (determined in different ways) are stated to be significantly different 
(some small differences between the values can result from noise or imprecise 
measurements) . 

For the analyzed arithmetic unit there exist two potential conflict struc- 
tures, which in fact describe real conflicts; they are shown in Fig. 16.7 and 
Fig. 16.8. 




Fig. 16.7. Conflict set {al,ml,m2} 



From Reiter’s theory it follows that diagnoses are generated as minimal 
hitting sets of all minimal conflicts. In Fig. 16.9 the way of constructing the 
generated diagnoses is presented. 

Causal graphs can constitute a very useful tool for tracing potential causes 
in case of a failure; in connection with Reiter’s theory they allow perform- 
ing an automated search for conflict sets and the generation of potential 
diagnoses. 
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Fig. 16.8. Conflict set {al, a2, ml, m3} 



{ 

{ 



a1 




4 



Fig. 16.9. Minimal hitting sets 



16.4.7. Examples of application 

Reiter’s theory in its pure form requires a model of the analyzed system in 
the form of a set of first-order logic formulae. It also requires the application 
of an automated theorem prover, e.g., one based on the resolution method by 
Robinson. Another requirement is that the model must completely describe 
its correct behavior and the theorem prover must assure consistent and com- 
plete derivations. For those reasons applications of Reiter’s theory in its pure 
form are limited. Examples presented most frequently refer to the diagnos- 
tics of logical circuits (especially those performing functions at the level of 
propositional calculus), since for such circuits it is relatively simple to build 
a logical model in the form of first-order predicate calculus formulae. 

A classical example quoted in numerous papers is the one-bit full adder 
system; there are also examples of diagnosing multi-bit adders. Some first 
papers on and applications of the elements of an informal diagnostic theory 
based on the consistency analysis date back to 1976 (among other things, the 
concept of the conflict set was formulated) (de Kleer, 1976). The diagnostic 
system Local, developed by de Kleer, constituted a component of the system 
Sophie III for computer-aided instruction. 

Diagnostic systems making use of the model of a system to be diagnosed 
and model-based reasoning found applications mainly in the domain of elec- 
tronic circuit diagnosis (both digital and analog ones), but also in neuro- 
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physiology, the diagnostics of hydraulic systems, and other technological sys- 
tems. Many examples of such diagnostic systems are mentioned in (Davis and 
Hamscher, 1992), e.g., INTER (de Kleer, 1976), ABEL (Patil et a/., 1981), 
HT (Davies et ah, 1982), DART (Genesereth, 1984), GDE (de Kleer and 
Williams, 1987), or DEDALE (Dague et al, 1987). 

In the literature one can come across many other examples of applications 
of diagnostic systems making use of the system model and based on Reiter’s 
theory. Some modifications of the GDE system {General Diagnostic Engine) 
were presented in (de Kleer, 1991) (the Sherlock system) and in (Bakker and 
Bourseau, 1992) (the PDE system). In (Mozetic, 1991), the concept of a hier- 
archical diagnostic system was put forward; it was used in heart diagnosis. A 
diagnostic system IDA {Incremental Diagnostic Algorithm) was successfully 
applied to the diagnosis of a 500 bit adder (composed of about 2500 logical 
gates) (Mozetic, 1992). A review of diagnostic approaches and their applica- 
tions to the diagnosis of digital and analog electronic circuits and to dynamic 
systems described by means of differential equations can be found in (Ham- 
scher et a/., 1992). Further developments of the theory are presented in the 
series of the DX conferences; for instance, in (Abu-Hakima, 1996), examples 
of applications to the diagnosis of the fuel supply system of airplane engines, 
analog system, cars, fuel supply systems of ship engines, dynamic systems, 
technological installations and even computer programs are presented. 

16.4.8. CA-EN and TIGER systems 

An ambitious attempt at applying model-based methods to building a di- 
agnostic system for the monitoring and diagnosis of gas turbines was made 
in the TIGER project (Milne and Trave-Massuyes, 1997; Milne et al, 1994; 
1995; Trave-Massuyes, 1997a). At present, the current version of the system 
is in continuous use in numerous technological installations and it proved to 
be useful in the engineering practice (Milne and Nicol, 2000). One of the di- 
agnostic subsystems in TIGER is the CA-EN system, performing diagnostic 
reasoning based on the system model with the use of causal graphs (Bousson 
and Trave-Massuyes, 1993; Bousson et al, 1994). 



16.5. Logical causal graphs 

Logical causal graphs constitute a tool for the direct modeling of causal de- 
pendencies in the diagnosed system. The idea of a causal graph incorporating 
logical functions is presented below. Such graphs describe the behavior of the 
system, both correct and faulty, at the level of abstraction of a proposition- 
al logic model. The nodes of the graph represent symptoms which can be 
observed in the analyzed system and its vertices show the causal relation- 
ship; moreover, in such graphs logical functions describing interdependencies 
between the symptoms can be represented. 
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Let N denote a finite set of symptoms describing the behavior of the 
analyzed system. It is further assumed that this set is composed of three 
disjoint subsets, namely M, V and D, N = D U V U M, where 

• D is a set of primary symptoms, i.e., such that none of their causes 
will be further searched for in the analyzed system; these symptoms will 
constitute the so-called elementary diagnoses', according to the terminology 
used in (Koscielny, 2001), such symptoms denote single faults of the analyzed 
system; 

• M is a set of terminal symptoms, i.e., symptoms describing system 
behavior from the point of view of an external observer, which are not con- 
sidered to be further causes of other symptoms in the context of the analyzed 
system; it is assumed that the symptoms of M are all always observable; 

• V is a set of intermediate symptoms, i.e., symptoms which are caused 
by other symptoms and which may cause the occurrence of other symptoms 
as well; they can be observable or not (in certain cases they can be detected 
by specific measurements or tests), and in the case of inobservability they 
can be confirmed only in an intermediate way, i.e., through the observation 
of other symptoms causally and logically related to them. 

It is assumed that the causal dependencies of the type (16.7) in the system 
between the symptoms of N are explicitly defined. Let ^ denote the set of 
all such dependencies. A logical causal graph is defined then as follows: 

Definition 16.7. A logical causal graph is a pair G = (N, ^). 

In practice it is sufficient to use some most typical logical functions, i.e., 
conjunction {AND), disjunction {OR) and negation {NOT)', the graphs using 
logical symptoms and the three functions mentioned above will be referred 
to as the so-called AAD/Oi^/iVOT graphs. 

16.5.1. AND/OR/ NOT graphs 

In the majority of applications for the description of logical causal dependen- 
cies between symptoms it is enough to use the following three basic logical 
functors: conjunction, disjunction and negation. 

Definition 16.8. A logical causal graph of the AND/OR/NOT type is a pair 
G = (N,^), where ^ = {AND, OR, NOT}', the logical functors allow any 
finite number of arguments. 

The logical causal graph allows defining the structure of causal dependen- 
cies in the diagnosed system. For simplicity, it is assumed that the graph does 
not contain cycles^. The nodes of D do not have ancestors, and the nodes 

^ For the sake of simplicity no temporal dependencies are introduced; without the tem- 
poral marking the existence of cycles would lead to the infinite chaining of diagnostic 
inference. 
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Fig. 16.10. Example of a logical causal graph of the AND/ OR/ NOT type 



of M do not have successors. An example of a simple logical causal graph of 
the type AND/ OR/ NOT is presented in Fig. 16.10. 

The nodes of the graph may be assigned logical values of symptoms rep- 
resented by the nodes; symptoms in logical graphs are considered equivalent 
to atomic formulae of propositional logic. By X~^ let us denote any set of 
symptoms assigned the value true, and by any set of symptoms assigned 
the value false. Having defined the state of some symptoms one may require 
an explanation of this state in the form of elementary diagnoses and their 
logical values implying the observed state. The state of some symptoms can 
be known by observations, measurements, tests, monitoring, etc., and for 
some other symptoms the state can be deduced. Let us define the diagnostic 
problem. 

Definition 16.9. Let G be an AND/ OjR/AOT causal logical graph. A diag- 
nostic problem is any five-tuple of the form 

(G, M+, M-,N+,N~), (16.20) 

where G is an AND/ OR/ NOT cdiusal graph, M+ C M and M~ C M are 
distinguished sets of terminal symptoms (manifestations), the true and the 
false ones respectively, and such that they require a diagnosis, while N~^ C N 
and N~ C N constitute sets of some other symptoms (observations), true 
and false respectively, which define the context of the failure. 
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The sets and N~ provide some auxiliary information allowing us to 
control diagnostic inference better and to exclude some potential hypotheses. 
Note that both the positive and negative values of symptoms can indicate the 
existence of a fault in the system. Moreover, the definition of the diagnostic 
problem is very general; in fact, one can require not only an explanation 
of some faulty outputs, but also of some other states of the system, whether 
they are incorrect or not. Determining a diagnostic problem as a failure would 
require a prior definition of the class of correct states - in such a case, a failure 
would be any state not belonging to the correct states. Another possibility 
is to define explicitly some set of faulty states. Both of the approaches are 
admissible and can be applied depending on the available knowledge about 
the system (its logical causal model). 

16.5.2. Information propagation in logical causal graphs; 
the state of the graph 

When logical values of certain symptoms are known, the corresponding nodes 
of the graph can be assigned logical values true or false. This means that the 
state of the diagnosed system, analyzed at the level of symptoms describing 
its behavior, can be determined or at least partially determined - not all 
values of the symptoms must be known. 

Information propagation and the search for a diagnosis explaining the 
observed failure in logical causal graphs are carried out according to the 
rules of logical inference. The search for diagnoses is performed backwards, 
i.e., it is based on abductive reasoning. It can be performed with the use of 
any graph-search procedure, a systematic or heuristic one. A core step of any 
such procedure is the change of the state of one or more symptoms according 
to the rules of information propagation in the graph. 

Consider any AND node of the graph having the form [ni, ri2, . . . , Ui] — > 
n, an OR node of the form ni|n2| . . . Ui — > n, and a NOT node of the form 
n — #n'. Below a set of universal inference and information propagation rules 
for a graph of the type AND / OR/ NOT is specified, both for forward and 
backward inference. 

Backward inference rules for information propagation 

• the OR node, false: if a node n of the type OR is false, then all its 
predecessors ni,n2, . . . ,ni should be set to false, 

• the AND node, true: if a node n of the type AND is true, then all its 
predecessors ni , ri2 , . . . , should be set to true, 

• the NOT node, true: if a node n' of the type NOT is true, then its 
predecessor n should be set to false, 

• the NOT node, false: if a node n' of the type NOT is false, then its 
predecessor n should be set to true. 
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Forward inference rules for information propagation 

• the OR node, true: if at least one of the predecessors rik G 
{ni, n2, . . . , of an OR node is true, then the node n should be set to 
true, 

• the AND node, false: if at least one of the predecessors Uk G 
{ni,ri2, . . . ,ni} of an AND node is false, then the node n should be set 
to false, 

• the NOT node, true: if the predecessor n of a NOT node is false, then 
the node n' should be set to true, 

• the NOT node, false: if the predecessor n of a NOT node is true, then 
the node n' should be set to false, 

• the OR node, false: if all the predecessors ni,ri2, . . . ,rii of an OR node 
are false, then the node n should be set to false, 

• the AND node, true: if all the predecessors ni,ri2, . . . ,rii of an AND 
node are true, then the node n should be set to true. 

The above rules are in fact deterministic ones, defining state propagation; 
they are fired any time their preconditions are satisfied, provided that their 
application produces a unique, consistent state. The current state of the graph 
is defined by the pair of sets {S~^,S~) containing all the true and false 
symptoms defined in the graph, and being the fixed point of the operation of 
state propagation. Of course, only consistent states are taken into account; a 
state is consistent if any symptom is assigned either the true or false value, 
but no symptom can be marked true and false at the same time. The value 
of a symptom can be also undetermined. 

16.5.3. Diagnostic reasoning; abduction 

From a logical point of view, the search for a solution of some diagnostic prob- 
lem is a kind of abductive reasoning, i.e., it is backward reasoning. Putting 
aside the presented rules for backward inference, the rules of abductive infer- 
ence have the following pattern: 



a 



(16.21) 



Abductive inference rules allow reasoning about possible premises (or 
causes) for a given conclusion if some implication (s) containing this con- 
clusion is given. Abductive reasoning is not a valid inference paradigm - it 
consists rather in the formation of hypotheses explaining or justifying the 
conclusion, but constituting only potential causes or justification^. 

^Note that abduction was in fact successfully used by Sherlock Holmes; contrary to 
what is claimed in the book - using deduction - most of his clever reasoning consisted 
in hypothesizing about causes explaining the observed results. A similar approach can be 
used in diagnostic reasoning. 
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In the case of AND/ OR/ NOT causal graphs there are two abductive 
inference rules; in fact, they are rules for a backward search of the graph, and 
they allow the generation of potential alternative solutions. 



Abductive inference rules 

• the OR node, true: if an OR n is true, then at least one of its predeces- 
sors Uk G {ni, 722, ... 5 Ui} must be true; if not, one of them must be selected 
and set to true, 

• the AND node, false: if an AND node is false, then at least one of its 
predecessors nu G {ni, ^ 2 , . . . , n^} must be false; if not, one of them must be 
selected and set to false. 



The above rules can be expressed with the following schemes of abductive 
inference: 



rik 



and 



-in, Ui Au 2 A . . . Aui => n 

^Uk 



(16.23) 



The application of any of the above rules results in a change of the state 
of the graph; a maximal propagation of this new state should be carried out 
then with the use of deterministic inference rules for information propagation; 
if an inconsistent state is produced, it should be abandoned and backtrack- 
ing should take place. The search procedure can be any systematic search 
procedure (e.g., breadth-first, depth-first) or a heuristic one. 

The initial state of the graph is defined by the diagnostic problem of 
the form {G, M'^ , M~ , N'^ , N~). As a result of applying the search proce- 
dure based on abductive inference rules, information propagation rules and 
the elimination of inconsistent states, some hypotheses explaining the failure 
and solving the diagnostic problem are generated. The obtained explanations 
constitute potential diagnoses. 

Let D = {D~^,D~) denote some assignment of logical values to elemen- 
tary diagnoses. If the state 5 = (5“^,5“) can be obtained from D with the 
use of propagation rules, this fact is denoted by D \- S. Since all informa- 
tion propagation rules represent valid logical inference, if D h 5, then also 
5 is a logical consequence of D (T) |= 5). Hence, D constitutes a correct 
explanation of the state S. 



Definition 16.10. Let {G, M~^ , M~ , N~^ , N~) be a diagnostic problem. A 
diagnosis D (a solution of the diagnostic problem) is any pair of the form 
{D~^,D~) of the sets of true and false input symptoms (of elementary diag- 
noses assigned the value true or false, respectively), satisfying the following 
conditions: 

• (D~^,D~) h (M"^,M“), i.e., the diagnosis explains all the symptoms 
indicating a fault. 
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• if {S~^,S ) is a state implied by the diagnosis {D'^,D ), then such a 
state must be consistent, i.e., S~^ C\ S~ =0, 

• each such state is consistent with the observations, i.e., N~^ D 5“ = 0 
and N~ H 5“^ = 0. 

Moreover, most frequently the analysis is restricted to minimal diagnoses, 
i.e., such that no pair of the sets {Dq,Dq) ^ where C D+ 

and Dq C D~ , can be a diagnosis. 



Consider a simple example of a diagnostic problem for the graph presented 
in Fig. 16.10. Some example solutions to the problem defined by M~^ = {^i}? 
M~ = 0, — 0, N~ = 0 are shown in Fig. 16.11. 




Fig. 16.11. Example solutions of the diagnostic problem shown in Fig. 16.10 

The presented graphs show not only the generated potential diagnoses, 
i.e., = {D+ = {d^,d^},D- = 0), D 2 - = {di},D- - 0), = 

= {d^],D~ = 0), but also the way of inferring the state of the graph 
(including manifestations). Note that the diagnosis D\ is not a minimal 
diagnosis; however, in certain situations it may be worth considering since the 
derivation graph is significantly diflFerent from the one for the diagnosis D 2 
(and does not cover it). One can notice that for this example problem there 
exist four minimal diagnoses but also six independent diagnoses providing 
a potential solution corresponding to a minimal solution graph defining the 
necessary derivation. 

16.5.4. Solution analysis and diagnoses verification 

The diagnoses found with the use of the search algorithm applied to the 
causal graph and satisfying the conditions stated in Definition 16.10 con- 
stitute a correct explanation of the observed misbehavior. However, they 
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provide only potential solutions. Even if the analysis is restricted to minimal 
diagnoses only (in the sense of set inclusion) , for current observations usually 
more than one solution can be obtained. From a purely logical point of view, 
such solutions constitute a disjunction, and if the analyzed graph provides 
a complete specification of the causal dependencies in the system, then (at 
least) one of them will constitute a real diagnosis. 

Typically, it is assumed that single fault diagnoses representing the faults 
of a single system component are more likely than the multiple fault ones. The 
first step of the analysis of potential diagnoses can consists then in the selec- 
tion of singular diagnoses and their verification. Moreover, if some heuristic, 
qualitative or statistical information concerning the failure frequency of the 
analyzed components is accessible, the elementary diagnoses can be ordered 
from the most likely ones to the less probable ones. Verifying the diagnoses 
consists in checking the components assigned to the elementary diagnoses. 

If there exists a possibility of obtaining some further information on the 
state of the analyzed system in the form of logical values of some further 
symptoms given by the sets and then such auxiliary data can be 
useful in diagnoses verification by their application to: 

• the elimination of certain diagnoses; if D is a diagnosis and S = 
(5"^,5“) denotes the state of the graph implied by this diagnosis (D h 5), 
then the diagnosis D may be eliminated if it is inconsistent with the auxiliary 
data, i.e., there is S~^ D V~ 7 ^ 0 or 5~ fl 7 ^ 0, 

• the confirmation of certain diagnoses; the degree of confirmation can be 

calculated as the total number of elements in the sets and S~nV~, 

obviously if the diagnosis is not eliminated due to inconsistency as described 
above. 

If there exist a large number of potential diagnoses, the necessity of ex- 
amining many detailed components may become both a tedious and costly 
task. This is why it may be worth considering some reasonable approach 
to improve its efficiency. In fact, one can use expert knowledge, experience, 
information on earlier failures, case-based reasoning and the available statis- 
tical data concerning the failure ratio of system components. However, the 
core approach can be based on the following simple and rational paradigm. 

Let Di,D 2, . • . ,Di be the generated diagnoses, Di = (Df,D~), where 
the sets and contain some number of elements. It is also assumed 
that only minimal diagnoses are considered (with respect to set inclusion) 
and that the diagnoses are consistent {Df fl D^' = it}). The core idea of 
improving the efficiency of verification consists in a sequential examination of 
system components responsible for elementary diagnoses; during each step, a 
component should be selected in such a way that disregarding the result of the 
examination a possibly maximal number of diagnoses should be eliminated. 

Two diagnoses Di = {D^,D^) and Dj = are inconsistent if 

Df n D- 7 ^ 0 or D- n D'j' 7 ^ 0. For inconsistent diagnoses there exists an 
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element d such that in one of the diagnoses it is true, and in the other it 
is false; such an element will be referred to as a conflict element Since the 
result of the test is not known a priori, it seems reasonable to examine the 
conflict element first - disregarding the result of the check, at least one of 
the diagnoses will be eliminated. 

Now, let n+(d) denote the number of diagnoses in which d occurs taking 
the value true (i.e., it belongs to D^), and let n~{d) denote the number of 
diagnoses in which it is false (i.e., it belongs to Df). The smallest number 
of eliminated diagnoses after testing d, independently of the result, is 

r{d) — min (n~^(d), n~(d)) , (16.24) 

and it should be as big as possible. 

Let d* denote the elementary diagnosis selected for verification; it seems 
reasonable that the selection should be made in such a way that 

n (16.25) 

cL^JJ\ ,L/2 t • ••■L'l 

In the case when the selection of d* is not unique, some further criteria 
based on the probability of a fault or the test cost can be considered. 

16.5.5. Extensions of the basic formalism 

The basic approach using logical causal graphs may be extended both by in- 
corporating mechanisms improving search efficiency and by extending knowl- 
edge representation and inference capabilities. 

The idea of the first extension consists in restricting the search by ob- 
taining some auxiliary information during the search; in other words, the 
examination of the graph becomes an interactive process. Auxiliary data can 
be obtained through observations, measurements, tests, etc.; in a general 
case one can say that auxiliary information can be obtained by diagnostic 
tests. Such a diagnostic test, performed in a state 5, allows determining the 
logical values of new symptoms. The result of a test t made in the state 
S = (5+,5“) and on the assumption of the existence of an unknown fault 
is a new state St = (5^,5^“); since the goal of such a test consists in ob- 
taining new information, at least one of the following relations should hold: 
5+ C 5+ or S~ C 5,-. 

In general, choosing the test should be performed in such a way that the 
number of symptoms whose logical values become known as the result of the 
test should be maximized (taking into account information propagation) . As 
a particular case, symptoms selected during applications of the rules of the 
backward search can be tested; the test allows the elimination (or confirma- 
tion) of selected hypothesis. Some further remarks on the diagnostic test are 
presented in (Fuster-Parra, 1996). 

Another approach allowing to improve search efficiency is the application 
of the probabilities of symptoms for search ordering; both classical proba- 
bilities determining the frequency of symptoms occurrence and qualitative 
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probabilities (Fuster-Parra, 1996) determining only the mutual relationship 
ordering such frequencies can be considered. When this kind of information 
is accessible, during the backward search most likely symptoms are selected 
first for the analysis. 

Some further extension consists in using the concept of the so-called func- 
tional causal graphs (Ligqza, 1997). The nodes of such graphs are no longer 
restricted to represent logical variables; in fact, they can take more than two 
values. A typical example is a variable taking three qualitative values, e.g., 
{— ,0,+}, where 0 denotes the nominal state, — denotes deviation below 
the nominal value, and + deviation above the nominal value. Most frequent- 
ly, for diagnostic purposes the sign of deviation considered as a qualitative 
value is a useful diagnostic characteristic. Unfortunately, the incorporation 
of variables taking several values results in replacing logical functions with 
more complex ones and makes the search process even more difficult, as the 
number of alternative input signals giving the observed output drastically 
increases. Some further extensions may consist in admitting fuzzy character- 
istics of faults and causal dependencies (Fuster-Parra and Ligqza, 1995a); it 
allows more adequate modeling of real systems in which there exist failures 
subject to gradual changes. Unfortunately, it also complicates the process of 
the search for diagnoses. 

The most crucial approach aimed at improving the efficiency of the diag- 
nostic process consists in introducing a hierarchical model of the diagnosed 
system (Ligqza and Fuster-Parra, 1998). It allows relatively fast generation of 
the basic components of a diagnosis or the restriction of diagnostic reasoning 
to some area or subsystem. 

16.5.6. Example of application 

Consider an example of applying a logical causal graph to the diagnosis of a 
simple system. The system of interest is shown in Fig. 16.12. It is composed 
of a tank, a liquid supply and removal system, and a control system. For the 
sake of diagnostic reasoning, the following manifestation, intermediate and 
input symptoms (elementary diagnoses) are defined: 

Manifestations : 
m - tank overflow. 

Intermediate symptoms: 
vl - valve_open, 
v2 - pnmp_off , 

u3 - valve_stuck_in_open_position, 
u4 - valve_open_by_control_signal, 
u5 - pump_off 3y_power_off , 
u6 - pump_of f _control_signal, 
vl - pump_blocked, 
u8 - pump_on_control_signal. 
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Fig. 16.12. Example system to be diagnosed 



v9 - valve_open_signal, 

?;10 - pump_on_signal_from_level_sensor, 
vll - pump_on_signal_from_control_system, 
vl2 -power_off, 

vl3 - valve_open_signal_from_level_sensor, 

Input symptoms - elementary diagnoses: 

dl - valve_stuck_in_open_position_f ault, 

d2 - valve_open_control_signal_on, 

dS - level_sensor_on_when_level_too_high, 

c?4 - pump_on_by_control, 

c?5 - pump_broken_down_f ault, 

d6 - power_on. 

Note that in the presented model the diagnoses may include both ele- 
mentary faults, the state of control signals and the status of environmental 
variables; therefore, the diagnoses can be very precise and apart from faults 
they may describe all circumstances of the failure. 

The operation of the system is relatively simple. During the normal work 
the tank should contain some amount of liquid. If there is not enough liquid, 
the control system opens the valve and the level increases until the level 
sensor turns it off. The pump is normally used to remove the liquid at the 
end of the working cycle; however, in case of a failure (tank overflow) it can 
be turned on in order to remove the liquid from the tank. Both the tank and 
the pump are controlled by a programmed control system. For safety reasons 
the system has double protection: in the case of tank overflow the valve is 
closed and the pump is put on. It is also assumed that the capacity of the 
pump is greater than that of the valve. 
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Fig. 16.13. Logical causal graph for the analyzed system 



In Fig. 16.13, a causal logical graph representing a model of the system for 
diagnosing the case of tank overflow is presented. Moreover, in Fig. 16.13 the 
nodes having logical values true are marked with black filled circles. In this 
way a diagnostic problem is defined, where — {m}, and N'^ — {u2,d6} 
is a set of auxiliary observations. 

After applying a systematic search procedure for the presented graph the 
following six diagnoses Di — {Df can be generated: 

1. L)i-({dl},{d3,d4}), 

2 . D2 = {{dl,d5},{}), 

3. Ds = {{}Ad^dA}), 

4. D, = {{d5},{dS}), 

5. = ({d2,d6},{t/3,d4}), 

6. De = ({d2,d5,d6},{}). 

Each of the diagnoses provides a diflFerent explanation of the failure; all 
of them imply different subgraphs constituting a full causal explanation of 
the diagnostic problem. However, with respect to set inclusion only four of 
the diagnoses are minimal ones (i.e., and Dq). For the sake of 

illustrating the problem of combinatorial explosion, it can be checked that 
when there are no auxiliary observations {N~^ = {u2,d6}), there exist as 
many as 14 potential diagnoses, and 6 of them are minimal ones with respect 
to set inclusion. 

Consider once more the diagnosis of logical circuits on the basis of the 
full adder discussed before. For a given combinatoric system there exists a 
relatively straightforward way to build its model in the form of a logical 




16. Selected methods of knowledge engineering in systems diagnosis 



667 



causal graph. In order to do this, one has to build a model of any gate, both 
for correct and incorrect behavior; the faulty behavior appears to be dual to 
the correct one (a mirror image). In oder to differentiate between the two 
modes of behavior, the description should be complemented with a single bit 
signal, taking value 1 for the correct behavior and 0 for the faulty behavior. 
The idea of this approach is schematically presented in Fig. 16.14. 
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Fig. 16.14. Idea of the approach to extend the description of 
logical gates in order to cover the correct and faulty behavior 



Having defined the extended (correct and faulty) behavior it is possible to 
build the causal graph for the analyzed system using its schematic diagram in 
a straightforward, mechanical way. It is enough to build a logical causal graph 
for any extended model and connect the graphs according to the schematic 
diagram of the circuit. The diagnoses generated with the use of the graph 
include, as before, not only the specification of faulty elements, but also the 
values of input signals and a list of components which work correctly. For 
example, the approach applied to the full adder allows obtaining the following 
diagnoses: 

D+ = {Ol,X2], D:[ = {XI,A2}, 

Dt = {Ol,Al,X2], DZ = {X\}, 

Dt = {A2, X2}, Z?3- = {XI, Al, 01}, 
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Dt ^{Xl,0l,A2}, 
Dt ^ {XI, 01, Al], 

Dt - {^ 1 }, 



= {X2}, 

Dt = {X2}, 

D~ = {Al,A2,Ol,X2}. 



Some other application examples are presented in (Fuster-Parra, 1996). 



16.6. Comparison of selected approaches 

Knowledge engineering methods based on the use of a system model have 
many common features. The key idea is the possibility of predicting system 
behavior and comparing it with current observations. The above two classes 
of model-based diagnostic approaches have their foundations in (i) Reiter’s 
theory and consistency-based reasoning, and (ii) the use of causal dependen- 
cies and abductive inference. 

An important characteristic of Reiter’s theory and methods belonging to 
the first group is that they use only the model of the correct system behavior. 
In fact, it allows diagnosing systems when there is no expert knowledge and 
it does not require a training phase. What is also important is that not only 
single-fault diagnoses are generated, but multi-component ones as well. As a 
result of the diagnostic process all minimal potential diagnoses are generated. 
There are also some disadvantages; in its pure form, the approach requires 
the use of a theorem proving program, and this results in high computational 
complexity. Further, this theory is not very intuitive and following diagnostic 
inference is a difficult task. Moreover, Reiter’s theory does not provide an 
efficient method for systematic conflict generation in a general case; for a 
given class of systems an appropriate approach must be developed. 

In the case of abductive inference and methods of the second group one 
deals with a relatively simple and intuitive approach allowing us to trace the 
inference process. At any stage it is possible to direct diagnostic reasoning 
and take into account additional tests and observations. The process of di- 
agnoses generation is based on a sequential search trough the causal graph 
and may have an incremental character. There also exist a number of nat- 
ural extensions of this approach improving reasoning efficiency. The basic 
issue is how to build the causal graph, although for certain systems (combi- 
natorial logical circuits) it is a straightforward task. Moreover, such a graph 
can be modeled with the use of almost any rules, provided that they can 
be interpreted backwards. For building a diagnostic inference module one 
can use the existing tools, e.g., Prolog, both as a direct interpreter and as 
a meta-language for encoding domain knowledge and reasoning rules. The 
approaches based on the use of logical causal graphs allow also introducing 
hierarchic diagnostic procedures, which seems to be of primary importance 
when diagnosing more complex systems. They allow also an interactive di- 
agnosis, where the formulation of partial diagnostic hypotheses is interwoven 
with their verification. 
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Prom a logical point of view it is important to notice that causal graphs 
introduce the layer (s) of intermediate symptoms. Note that if there are given 
some conflict sets Ci, C 2 , . • . , then any such set can be assigned an OR 
node. In fact, if == {ci, C 2 , . . . , c^}, then the conflict Ci may be assigned 
the formula 'ijji = ci V C 2 V . . . V . Recall that the generation of diagnoses in 
Reiter’s theory consists in determining minimal hitting sets for the obtained 
conflicts. Using the language of logical causal graphs, this corresponds to the 
construction of one more layer containing exactly one AND node, described 
by means of the formula -01 A -02 A . . . A A graph constructed as above 
constitutes a simple tree structure allowing the generation of all potential 
diagnoses. In fact, Reiter’s approach consists in searching a simple, two-level 
logical causal graph of the AND / OR type, with a single AND node indicating 
the existence of a failure and a number of OR nodes equal to the number of 
the conflicts. On the other hand, logical causal graphs of the AND / OR/ NOT 
type allow the representation of much more complex structures, having many 
intermediate layers, and thus the introduction of a hierarchical approach with 
the incremental verification of partial diagnostic hypotheses. 

To end with that, let us compare diagnoses generated using both ap- 
proaches for the case of the full adder. Recall that the application of Reiter’s 
theory results in the following diagnoses: Di — {XI}, D 2 = {X2,A2} and 
Ds = {X2,01}, 

In the case of a more complete system description covering also the incor- 
rect behavior of the system we have obtained as many as six diagnoses. They 
are more complex than the ones obtained using Reiter’s theory, but they are 
more precise - note that not every superset of a diagnosis in the sense of Re- 
iter’s is a correct diagnosis. The restriction of the diagnostic process to just 
minimal diagnoses (with respect to set inclusion) represented by the listing of 
faulty components may only, from a practical point of view, be perceived as 
too much of a far-going simplification, which does not specify the complete 
explanation of failure circumstances. 



16.7. Summary 

In this chapter selected methods of knowledge engineering with the appli- 
cation to the diagnosis of technical systems were presented in brief. These 
methods can be classified into expert ones, using the so-called shallow knowl- 
edge of an expert, being the result of experience and useful for diagnosis, 
and methods based on using a model of the system and not requiring expert 
knowledge but only the model of system behavior. A classical example of the 
use of methods belonging to the first group are expert systems^ with special 
attention paid to rule-based systems. Methods of the second group can be 
divided into those applying consistency-based reasoning and using abductive 
inference. Both the groups use different models of the system. A typical ex- 
ample of methods of the first group is Reiter’s theory. Abductive methods 
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use a model in the form of causal graphs. All presented methods differ with 
respect to the type of the knowledge applied and the way of using it. 

The presented groups of methods have well-defined theoretical founda- 
tions. However, for efficient application they require the adjustment to the 
specific type of the diagnosed system. These methods may serve as the core 
of an advanced diagnostic system, but in order to improve the efficiency they 
should be equipped with specific domain knowledge and heuristic knowledge. 
They may be also complementary to one another, and it seems reasonable to 
join expert knowledge based on experience with knowledge about the system 
model. It should allow the diagnosis of new, previously unknown failures, 
while the expert component should allow improving reasoning efficiency. 
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Chapter 17 



METHODS OF ACQUISITION OF 
DIAGNOSTIC KNOWLEDGE^ 

Wojciech MOCZULSKI* 



17.1. Introduction 

Contemporary technical means, especially machinery and equipment, become 
more and more complex. Therefore, one may see the growth of requirements 
for persons and organizations whose responsibility is the building (that in- 
cludes designing and manufacturing) and operation of machinery and equip- 
ment. To efficiently perform in each of the above-mentioned zones of activity, 
one needs suitable knowledge and skills. Both of them are possessed by do- 
main experts.^ usually several in each domain. They acquire knowledge and 
skills during their studies, long-term professional activity, by observations or 
from technical literature. 

It is worth focusing our attention on the following important issues: 

• Knowledge ambiguity. Knowledge about the operation and maintenance 
of machinery and equipment is ambiguous, incomplete, and contradictory. 
There exist divergent expert’ opinions on the same subject matter. 

• Required promptness of operation. If early symptoms of an incoming 
catastrophic failure of a critical object or installation (as a nuclear power 
plant or chemical plant) are encountered, a prompt reaction of a monitoring 
system is required. 
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• Accessibility of an expert on the spot. An expert is rarely accessible in 
place. Furthermore, he/she is usually not interested in sharing his/her own 
knowledge and skills. 

• Knowledge is precious. Moreover, if not passed on to others, it may 
be lost. For example, Pokojski (2000) stresses the fact that organizations 
dealing with designing machinery start to treat intellectual properties as a 
very valuable asset. Design knowledge is collected by subsequent generations 
of workers and, in fact, is not acquired, while blueprints stored in an archive 
are, to a large degree, information carriers. Meanwhile, this knowledge is 
needed to efficiently introduce new workers into their job. 

• Humans are exposed to a stream of messages. Those arrive in the form 
of a countless amount of data. It is nearly impossible to pick up messages 
that carry relevant information or to identify important regularities that may 
be recognized as qualitative or quantitative knowledge. 

All quoted arguments point to the need to replace a human expert by 
specialized intelligent information systems such as intelligent databases and 
expert systems. Their basic elements are knowledge bases., in which knowledge 
that is required for aiding operation within the given (usually narrow) domain 
is stored. This is directly related to the subject matter of the present chapter, 
which deals with methods of acquiring diagnostic knowledge. 

The chapter is organized as follows. In Section 17.2 the problem of knowl- 
edge in technical diagnostics is discussed, the division of knowledge types into 
procedural and declarative ones is introduced and the most important knowl- 
edge sources are analysed. In Section 17.3 the basic methodological problem, 
which has been undertaken by the author and his colleagues from the De- 
partment of Fundamentals of Machinery Design of the Silesian University of 
Technology in Gliwice, Poland, is formulated. In Section 17.4, the fundamen- 
tal part of the chapter, methods of knowledge acquisition are dealt with ac- 
cording to the main classification into declarative and procedural knowledge. 
Special attention is paid to methods of knowledge acquisition from domain 
experts and to two groups of methods of automated knowledge acquisition: 
machine learning and knowledge discovery. The course of the process of ac- 
quiring declarative knowledge is also described. In Section 17.5 the applied 
means of aiding the process of knowledge acquisition are described, with at- 
tention focused on the means that have been developed by the author’s group. 
Some examples of the application of the described methodology and means of 
aiding the process of knowledge acquisition, which concern selected problems 
of technical diagnostics, are given in Section 17.6. Finally, potential issues 
concerning further research on knowledge acquisition in technical diagnostics 
are enumerated. 
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17.2. Knowledge in technical diagnostics 

With reference to a human expert, knowledge may be understood as a totality 
of what the given person knows. Knowledge in the given domain (Moczulski, 
1997) concerns objects (such as machines and their assemblies) and classes 
of objects belonging to this domain, the taxonomy of classes of objects, fea- 
tures of objects and classes of objects, relations between objects and their 
classes. This knowledge includes also skills, understanding general principles, 
procedures of operation, etc. Knowledge is acquired usually during long-term 
learning, for example during studies or specialist training which takes place 
under the supervision of a generally understood teacher. Moreover, in tech- 
nical diagnostics it is advisable to consider also the collected experience and 
skills. Both are usually obtained unaided and are results of a long-term ac- 
tivity of the expert in the given domain. In the following part of this chapter 
the term knowledge will encompass either typical knowledge or a practical 
experience of the expert. It should be emphasized that knowledge acquisition 
by a human is usually a long-lasting process. 

Let us realize that diagnostic knowledge discussed so far concerns both 
facts and processes. Bearing this and many other reasons, such as ways of 
representing and implementing knowledge, in mind, it is purposeful to dis- 
tinguish declarative knowledge and procedural knowledge. 

17.2.1. Declarative knowledge 

Knowledge about facts, objects, relations between objects, classes of objects, 
features of these objects, etc. is represented in a declarative way and will 
further be called declarative knowledge. Let us notice that a majority of works 
done so far on the acquisition of diagnostic knowledge has dealt with the 
representation and acquisition of declarative knowledge. 

Declarative knowledge may be acquired either from experts or from 
databases (Fig. 17.1). Experts may indirectly take part in the knowledge 
acquisition process, or may be authors of publications, handbooks and text- 
books, design documentation, etc., which may be carriers of knowledge sub- 
sequently acquired during a separate process (in which the expert does not 
take part yet). 

Databases are another important source of declarative knowledge. In this 
case knowledge may be acquired in an automated way, with the use of either 
machine learning methods (if records in the database describe examples that 
are provisionally classified), or automatic methods of the discovery of either 
qualitative or quantitative (functional) dependencies. 

17.2.2. Procedural knowledge 

Declarative knowledge discussed so far is inadequate for aiding such processes 
as, e.g., comprehensively grasped diagnostic examination of a given machine 
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Fig. 17.1. Declarative diagnostic knowledge and its sources 



or device. This examination may be considered as a sequence of specific op- 
erations such as planning the diagnostic experiment to be performed on the 
object, carrying out the observation of the object, processing diagnostic sig- 
nals recorded during the observation, diagnostic reasoning, etc. In this case 
procedural knowledge is the subject of acquisition. This knowledge may be 
represented in the form of procedures and it has been until now and is still 
acquired from domain experts. They may take part in the process either di- 
rectly or, similarly as in the case of declarative knowledge, may be authors of 
publications, further undergoing analysis in order for procedural knowledge 
to be extracted from them (Wylezol, 2000). 



17.3. Problem formulation 

In the discussed case the result of the knowledge acquisition process is the 
knowledge base corresponding to some (usually narrow) application domain 
within the scope of technical diagnostics. 

An analysis of the available descriptions of research concerning knowledge 
acquisition in the scope of technical diagnostics has allowed to formulate the 
following conclusions: 

• knowledge is most frequently acquired from experts; moreover, an ad- 
ditional participant called knowledge engineer (or more such engineers), who 
may have important impact on its results, often takes part in this process; 
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• applications of methods of machine learning from examples, provision- 
ally classified either by experts or by automatic classification systems, are 
predominately encountered; 

• one takes less intensive advantage of an expert’s knowledge about the 
given application domain; 

• there is a lack of the generally acknowledged methodology of knowl- 
edge acquisition from the scope of the technical diagnostics of machinery, 
equipment and processes. 

Hence, there has been identified the need to develop a methodology of knowl- 
edge acquisition from the scope of technical diagnostics, which would take 
advantage of all basic sources of knowledge, both declarative and procedural, 
taking into account methods of the verification of the acquired knowledge 
(Moczulski, 1997). It has been stated that individual research tasks should 
address selections of: 

• data representation methods; 

• methods of representing both procedural and declarative knowledge; 

• proper methods of acquiring: 

- declarative knowledge: from experts and databases, 

— procedural knowledge: from experts; 

• methods of assessing and verifying the acquired knowledge. 

It has been decided that these methods would be selected with respect to 
their relevance to the needs of aiding a broad range of activities in the scope 
of the diagnostics of machinery and processes. 

Moreover, the problem formulated above includes the development of a 
system that might aid the carrying out of the complete process of knowledge 
acquisition. 



17.4. Selected methods of knowledge acquisition 

To solve the problem formulated above comparative research on the methods 
of knowledge acquisition and assessment known so far was required. New 
methods were supposed to be developed, too. The more useful methods will be 
discussed right now, taking into consideration the general distinction between 
methods suitable for declarative and procedural knowledge, respectively. 

17.4.1. Methods of acquiring declarative knowledge 

The majority of research carried out by the author and his team (Ciupke, 
2001; Kostka, 2001; Moczulski, 1997; 1998; 1999; 2001a; Wylezol, 2000) con- 
cerned the acquisition of declarative knowledge. The applied methods corre- 
spond to knowledge sources, therefore one may distinguish two general groups 
of methods of knowledge acquisition: from experts and from databases. 
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17.4.1.1. Methods of acquisition of declarative knowledge from experts 

Experts are basic knowledge sources, hence it is usually impossible and in 
each case inadvisable to pass them over. Experts at least provide background 
knowledge about the application domain, which includes objects and their 
classes, relations between classes of objects and individual objects, essential 
features of objects, etc. 

Knowledge from experts may be acquired: 

• with the participation of a knowledge engineer, or 

• without his/her participation. 

The first method was used at the early stage of the development of knowl- 
edge engineering (Buchanan et al, 1983). However, this method has many 
disadvantages. For example, a misunderstanding between the expert and the 
knowledge engineer may occur since the latter is not an expert in the appli- 
cation domain. Another source of problems is that the knowledge engineer 
has to interpret knowledge elicited from the expert, and to represent this 
knowledge in the knowledge base. 

The second method has been extensively developed (Moczulski, 1997; 
Wylezot, 2000). It consists in equipping the domain expert with proper aid 
(usually software) to enable him/her to represent knowledge by him/herself. 
Hence, the knowledge engineer may be eliminated from the introductory stage 
of the process of knowledge acquisition. 

17.4.1.2. Automated methods of acquisition of declarative knowledge 

Knowledge acquisition from domain experts is usually less effective. For ex- 
ample, if rules are to be acquired, an expert is able to write down a dozen or 
possibly a few dozen rules during a single session of knowledge acquisition. 
Furthermore, if new rules are added to the existing knowledge base, several 
types of malfunctions may be encountered as cycles of rules, contradictory or 
absorbing rules, whose identification may be difficult (Cholewa and Pedrycz, 
1987). Therefore, automated methods of knowledge acquisition from sets of ex- 
amples become more and more popular. For provisionally classified examples 
machine learning methods are used, while methods of knowledge discovery 
may be applied to unclassified data. 

Data representation. In many practical tasks of knowledge acquisition in 
technical diagnostics an attribute model of data representation is used, where 
the dataset is represented by a matrix: 

^ Uii • • • Q'lm dll ' * * diji 

Y ajsfi • • • ONm djsfi • • • diVn 



/ 



(17.1) 
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where aij, i = = are values of m condition attributes 

(e.g. values of symptom features observed for the object to be classified), and 
dij^ i = 1, . . . , N A j = 1, . . . , n are values of n decision attributes. In the 
general case this matrix is called an information system^ while for the fre- 
quently used case n — 1 the matrix (17.1) represents the co-called decision 
table (Pawlak, 1991). The described example, in which decision attributes 
occur, corresponds to the set of classified examples^ which may be applied 
to machine learning. However, a more general case also is dealt with, when 
no data classification is defined that is equivalent to the lack of any deci- 
sion attribute (i.e. n = 0). Such a dataset may be applied to the knowledge 
discovery process. 

In practice, condition and decision attributes usually have quantitative 
values. However, the applied algorithms of knowledge induction require data 
representation by means of discrete (qualitative) attribute values. Therefore, 
the discretization of quantitative attributes is needed. Moreover, many at- 
tributes are collected during data acquisition, since in many cases it is diffi- 
cult to determine which attribute will carry the largest amount of diagnostic 
information. Then the selection of relevant attributes may take place. Both 
problems of the discretization of attributes and the selection of relevant ones 
are discussed by Ciupke (2001). 

Discretization of quantitative attributes. The task of discretization con- 
sists in reducing the amount of information by the transformation of attribute 
values from quantitative into qualitative ones. If one additionally requires 
that the converted discrete values be assigned meanings easily understood 
by human operators, only a few of such values should be used. In the de- 
scribed research usually from 2 to 5 such qualitative values have been ap- 
plied, each of them assigned a linguistic value easily understood by humans 
(Moczulski, 1997). 

To carry out discretization, q threshold values Vi, i = 1, . . . ,q have to 
be fixed. They may be determined in the following ways: 

• Supervised - under the supervision of a domain expert, who in this 
case may determine at least the number of thresholds, or even the values 
of individual thresholds. The latter usually takes part when discretization is 
connected with determining the critical values of the given attribute, which 
may be used for diagnostic concluding. 

• Unsupervised, if thresholds are determined on the basis of the analysis of 
the complete decision table (Ciupke, 2001). Then the number of discretization 
thresholds is determined on the basis of statistical properties of individual 
sets of attribute values, or even on the basis of joint distributions of many 
attributes. 

The selection of values of discretization thresholds may be given a deeper 
content-related meaning. Respective values (or intervals of values) may be 
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assigned additional qualitative meanings^. For attributes describing the op- 
erating conditions of the object and its outputs, the classification of values 
of these attributes is recommended. If it is possible, each class of attribute 
values ought to be assigned a name that is understood by the personnel op- 
erating the machine, equipment or installation. Attention should be paid to 
the fact that discretization thresholds may be defined either in an absolute 
or a relative way (see Table 17.1). 



Table 17.1. Examples of defining linguistic values of attributes 



Attribute 

describing 


Name 


Absolute values 


Relative values 


input 


load 


light, medium, heavy 


reduced, normal, 
increased 


output 


amplitude 
of vibration 


low, medium, high 


reduced, normal, 
increased 


state 


overall vibration 
condition 


good, sufficient, 
still permissible, 
inaccessible 


deteriorated, no change, 
improved 



There is a problem with the right selection of discretization thresholds, 
namely, which way of determining discretization thresholds should be used: 
the absolute or the relative one. The explanation provided here concerns a 
state attribute. Thresholds are defined in the absolute way, if limit values 
of state attributes are known (Fig. 17.2(a)). Values defined in such a way 
carry more information on the object to be diagnosed. However, if these 
limit values are unknown, then threshold values may be determined in the 
relative manner, for example, with respect to some assumed reference state 
(Fig. 17.2(b)). In technical diagnostics individual thresholds are usually de- 
termined on a logarithmic scale using a constant factor x = vi/vi-i. For 
example, in the ISO standard No. 2372 for the definition of limit values of 
subsequent classes of state the factor x = 2.5 (8 dB) is used. It is worth 
emphasizing that a significant relative change also carries valuable diagnostic 
information. 

Problems connected with the issue of the selection of discretization thresh- 
olds are very extensive. Instead of applying inequalities, one can introduce 
definitions of qualitative values by means of fuzzy sets, which may further 
be assigned linguistic values. These problems are discussed in detail, e.g., by 
Kacprzyk (1986). 

^ The adoption of interval representation is a kind of simplification, which in some cas- 
es is insufficient. For example, if the temperature of oil used for the lubrication of a 
hydrodynamic bearing is being considered, two qualitative values, adequate and inad- 
equate (i.e. either too low or too high), may be used. Then the interval representation 
cannot be used (Moczulski, 1997). 
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^alert ^danger 



good 


still admissible 


inadmissible 


condition 


condition 


condition 




(a) 





0 

h-- 



large 

decrease 
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small 

decrease 


i no 

change 


small 

increase 


large 

increase 



(b) 



Fig. 17.2. Determination of discretization thresholds in: 
the absolute (a), and the relative (b) manner 



There are many methods of determining discretization thresholds. Compre- 
hensive research on this subject matter was carried out by Ciupke (2001), who 
performed a very detailed comparison of the usefulness of methods such as: 

1. Single- attribute methods, which consist in a simultaneous determination 
of discretization thresholds of a single attribute only, while values of other 
attributes are not taken into account. Among these methods the following 
were found to be useful in technical diagnostics: 

(a) method of an equal width of discretization intervals; 

(b) method of an equal frequency in discretization intervals; 

(c) ChiMerge method: 

This method consists in an introductory partition of the domain of the given 
attribute into subsets; then selected subintervals are joined until the stop 
criterion, which is defined by a minimal number of data in each interval and 
by the threshold value of the statistics, is reached; 

(d) method of minimum entropy: 

Entropy is calculated for a set of examples after the partitioning of this set 
into subsets. 

(e) method of single-attribute clustering: 

Threshold values are determined after the clustering of attribute values with 
the use of different approaches such as k nearest neighbours. 

2. Multi- attribute methods, which consist in the simultaneous determining 
of values of discretization thresholds for all attributes. Among them, the 
following methods are especially useful in technical diagnostics: 

(a) method of multi-attribute clustering: 

The essence of this method is similar to that of the method of single-attribute 
clustering described above. 

(b) heuristic method (partially based on rough sets). 
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Comparative research of these methods applied to different diagnostic exper- 
iments has made it possible to conclude that the best classification results are 
obtained either by the heuristic method or by the multi-attribute clustering. 

The selection of discretization thresholds may be optimized (Moczulski, 
1998). Most often a wrapper approach is used. It consists in evaluating the 
correctness of the selection of discretization thresholds by means of the overall 
error rate of the classifier obtained by induction from examples that have been 
discretized using the evaluated thresholds (Ciupke, 2001). 

It is worth emphasizing that by the selection of values of discretization 
thresholds the so-called overfitting of thresholds to the learning dataset may 
occur, which should be eliminated by all means. 

Selection of relevant attributes. Many authors (Sobczak and Malina, 
1985; Ciupke, 2001) pay attention to the fact that if one is going to obtain 
a classifier with sufficiently high performance it is unnecessary to take into 
account all attributes describing the objects to be classified. A subset of 
relevant attributes may be selected from within the set of all attributes of 
these objects. Ciupke (2001) carried out comparative research of methods of 
attribute selection, which rely upon: 

(a) probabilistic features: 

The attempt consists in analysing either covariance or correlation matrices. 

(b) application of rough sets: 

The method consists in the determination of a minimal reduct (Pawlak, 1991), 
which is the minimal subset of attributes that allows discerning all the ex- 
amples that belong to the dataset. 

(c) genetic algorithms: 

A chromosome is composed of zeroes and ones, which encode the list of 
attributes, while a fitness function may be defined one way or another. 

(d) minimum of entropy, 

(e) artificial neural networks: 

Having trained a classifier (a neural network) on the set of examples rep- 
resented by means of a complete set of attributes, weights assigned to each 
node are analysed. 

(f) inductive machine learning: 

This method makes an indirect assessment of the selected set of at- 
tributes through the evaluation of the overall error rate on the testing 
examples possible. 

Research carried out with the use of data collected from several diagnostic 
experiments has allowed concluding (Ciupke, 2001) that the best results of 
the classification of examples represented by values of attributes selected 
previously are obtained for the method based on inductive machine learning. 

Data sources. Data used in the knowledge acquisition process may originate 
from different sources such as experts^ observations (carried out during the 
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most generally understood experiments, either active or passive ones), and 
numerical experiments. Let us observe that: 

• unclassified data may be collected in databases of different types; they 
may in particular come either from observations or from passive or numerical 
experiments, 

• classified data (examples) may originate from experts, observations, 
active or passive experiments, or numerical ones. 

Beside experts, databases are a common source of data in technical diag- 
nostics. They contain examples collected during active or passive diagnostic 
experiments carried out on real objects, or obtained as results of numerical 
experiments conducted with the use of appropriate simulation software. In the 
first case the values of condition attributes are determined by measurements 
and the analysis of signals. 

In the process of knowledge acquisition classified examples are very often 
used. The classification may be carried out either by an expert or automat- 
ically, e.g., by a diagnostic monitoring device that compares the values of 
attributes with some threshold values and then formulates a diagnosis con- 
cerning the technical state of the monitored object. 

Inductive machine learning methods. Diagnostic knowledge may be ac- 
quired inductively using the so-called machine learning methods (Michalski, 
1983; Pawlak, 1991; Quinlan, 1986). In the described research a selective in- 
duction of rules by the generation of covers (Michalski, 1983), the induction of 
rules with the use of rough sets (Pawlak, 1991) and the induction of decision 
trees (Quinlan, 1986; 1993) were efficiently applied. 

Selective induction of rules by the generation of covers. The task 
consists in determining descriptions of classes which Michalski (1983) calls 
hypotheses. A simplified description of knowledge representation will follow^. 
These class descriptions are represented in the form of complex logical condi- 
tions and may be interpreted as complex rule premises. A set of rules contains 
as many rules as there are distinguished classes of state. The way of repre- 
senting rules and their induction corresponds to the case of sharp rules, which 
are determined precisely. 

The basic element composing the rule r is a selector interpreted as an 
elementary condition defined for the value val(a(o)) of the attribute a of 
the object o belonging to the domain D{r) of this rule: 

(Vo G D{r)) [val (a(o)) oc eterm(a)] , (17.2) 

where oc is an operator of the relation 

ocG{-, A <,<,>,>}, (17.3) 

^ A complete description of knowledge representation is given by Michalski (1983); see 
also (Moczulski, 1997). 
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and eterm{a) is an elementary term (i.e. constant), which should be kept in 
relation with the values of the attribute a. To avoid redundancy when rules 
are being represented, compound terms^ which are disjunctions of elementary 
terms, are considered: 

Term{a) = etermi{a) V eterma{o) V • • • V etermn{o). (17.4) 

A simple condition or selector (Michalski, 1983) is the following condition: 

(Vo G D(r)) [val(a(o)) a Term(a)], (17.5) 

which may be interpreted as the following disjunction of elementary condi- 
tions: 

(Vo G D{r)) { [val (a(o)) a etermi{a)\ 

V • • • V [val (a(o)) oc etermn{o)\ } . (17-6) 

A conjunction of selectors according to (17.5): 

(Vo G D{r)) { [val (oi(o)) oci Termi(a)] 

A • • • A [val (a^(o)) oc^ Termm{a)] } (17.7) 

is called a complex (Michalski, 1983). The complex is some compound logical 
condition defined for values of m different attributes assigned to the object 
o to be classified that belongs to the domain D{r) of the rule r. Finally, 
a disjunction of complexes according to (17.7) represents a premise of the 
compound classification rule, which is called a hypothesis by Michalski. The 
A:-th hypothesis is used for the classification of objects into the fc-th class. 

The algorithm of the induction of rules with the application of 
covers (Michalski, 1983) is called A^. It consists in general in generating the 
so-called covers II{E^ \ E~) of the set E'^ of positive examples of the given 
concept with the simultaneous excluding of the set E~ of negative examples 
(counterexamples) of the concept. Simplifying this, one can assume that a 
cover may be a disjunction of complexes according to (17.7). 

For K > 1 classes (to which the hypotheses {Rk \ k — 1,...,AT} corre- 
spond respectively) the algorithm may be described as follows (Cholewa and 
Pedrycz, 1987; Michalski, 1983): 

(i) For each hypothesis Rk there are created: a set E'^ of examples 
mistakenly omitted (exceptions): 



E+ = Ek\Rk, (17.8) 

and sets E'jf - of examples that in fact belong to the j-th class, but have been 
improperly included in the k-ih class, where j ^ k (a commission error has 
been made): 
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Fig. 17.3. Dependencies between the analysed sets of examples Ej^E'^ and Ej^ j 

where Ej denotes a set of all examples representing the j-th class, whereas 
Rk denotes a set of all examples for which the hypothesis Rk is satisfied (see 
Fig. 17.3). 

(ii) For each k = 1, . . . ,K a hypothesis R^ is created as an alternative 
that describes all misclassified examples: 

Rk= V (17-10) 

where 

U (EiUEt)). (17.11) 

(iii) New hypotheses R'j^ are created as covers: 

R'k = n[Ek\ U (Ri\R-UEt)). (17.12) 

i^k 

(iv) If for at least one value k G there still exist misclassi- 

fied (i.e. inconsistent with the respective hypothesis examples, then for 
k = one shall substitute the hypotheses Rk with and repeat 

the steps (i)-(iv). 

The obtained rules may be further applied for classifying new data, un- 
seen so far by the learning system. If there exists the so-called total fit of 
the premise of the given hypothesis Rk to the new object o to be classified, 
i.e., all logical conditions defined by selectors according to (17.5) are satis- 
fied, this object is classified into the fc-th class. If for each hypothesis there 
occurs a partial fit only (i.e., at least one elementary condition expressed by 
a selector is not satisfied), then the way of classifying this example depends 
on the method selected by the user of a learning system. In this case the 
degrees of the membership of the classified example to each of the analysed 
classes are calculated. This degree of membership takes numerical values from 
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the interval [0; 1]. A detailed discussion of methods of determining the mem- 
bership of the given example to the classes considered was carried out by 
Grzymala-Busse (1994). 

In the work (Moczulski, 1997) new concluding schemes were introduced. 
They prevent the classification of examples if too low values of the member- 
ship degree are encountered. It allows introducing two additional results of 
classification: 

• impossibility to classify the example into any class (if all hypotheses fit 
too poorly the example to be classified), 

• impossibility to unambiguously determine the membership of the ex- 
ample to some class (if there is more than one hypothesis that sufficiently 
well fits the example considered, but there are also no significant differences 
between values of the degree of membership to different classes) . 

Induction of rules with the use of rough sets. Rough sets (Pawlak, 
1991) have been introduced in order to represent inaccurate, uncertain and 
imprecise knowledge. A rough set is defined by a pair of sets - its lower 
and upper approximation, represented with the use of classes of indiscernible 
objects. The application of rough sets allows solving two crucial problems 
(Pawlak, 1991): 

• reduction of the number of redundant examples and redundant at- 
tributes, in order to obtain a minimal subset of attributes (called reduct) 
suitable for obtaining the highest performance of classification that is achiev- 
able by means of the accessible learning data, 

• acquisition of knowledge concerning the classification of these examples; 
this knowledge is represented by means of rules. 

To determine the lower and upper approximation of sets an indiscernibility 
relation is introduced. Let us consider an information system S according to 
(17.1). Let A be a set of all condition attributes, and D he a. set of decision 
attributes. Let B C A U D be a subset of attributes, and let ei,e 2 G E 
be two different examples. These examples are indiscernible by the subset of 
attributes B in the information system S, which is denoted by ei B e 2 , if 
the following is satisfied (Pawlak, 1991): 

eiBc 2 <=> {^a e B) [val (a(ei)) = val (a(c 2 ))] , (17.13) 

where val(a(e)) is the value of the attribute a for the example e. The indis- 
cernibility relation B is the equivalence, therefore it allows partitioning the 
set of examples into disjoint classes [e ] ^ of indiscernible elements. 

Pawlak (1991) defines the B-lower approximation of a subset Y C E of 
the set of all examples E as 



BY = u{eeE\[e]^CY} 



(17.14) 
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and the B-upper approximation of the subset Y as 

BY = u{ee E\[e]gnY (17.15) 

The J5-boundary of the set Y is defined as 

BnB{Y) = BY\BY. (17.16) 

The set RY contains all examples that, on the basis of values of the 
attributes a e B, are certainly classified as elements of the set Y. The set 
BY includes all examples that, on the basis of values of attributes from the 
subset B, may be classified as elements of the set Y. Elements of the set 
BnsiY) are classified using the attribute subset B neither as elements of 
the set Y nor as elements of its complementary set —Y (this set is also 
referred to as a doubtful area). 

The concepts introduced so far allow defining rough classification. Let 
B C A U {d} be a subset of the set of all attributes considered and, 
moreover, let E = {Ei,...,Ex} be the partition of the set of examples 
E = El , . . . , Ek into decision classes corresponding to the values of the 
decision attribute d. Then the J5-lower and E-upper approximation of the 
classification E are defined respectively as 

BE = {BEi,...,BEk}, (17.17) 

B E = {BEi, . . . , BEk}- (17.18) 

For the decision table represented by the matrix (17.1) attribute selection 
is carried out giving subsets of relevant attributes called reducts. The less 
numerous subset of attributes that still guarantees the classification of all 
examples considered is called a minimal reduct. It is worth noticing that there 
are usually more than one minimal reducts for an information system S. 

If the decision table contains only columns that correspond to attributes 
belonging to some minimal reduct E, decision rules may be determined 
(Pawlak, 1991): 

( A (^(^)) “ '^]) ^ (^(^)) = 5 ^ ^ (^)j (17.19) 

aeB 

where Dom (a) stands for the domain of the attribute a. These rules may 
be assigned certainty degrees, which are numbers from the interval [0; 1]. 

The set of decision rules obtained in the described way may be partitioned 
into: 

• a subset of certain rules generated on the basis of the lower approxi- 
mation of classification EE according to (17.14), and 

• a subset of possible rules generated on the basis of the area of doubtful 
classification: 



BuBiS) = {BEi \BE,,... ,BEk \ BEk}. 



(17.20) 
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A primitive set of rules described above may be further analysed in order to 
eliminate selectors, which are defined for attributes that do not belong to the 
given minimal reduct. After removing all such selectors a set of maximum 
general rules is obtained (Pawlak, 1991). 

Induction of decision trees. To acquire diagnostic knowledge from 
a set of classified examples, the method of the induction of decision trees 
(Quinlan, 1986; 1993) is often applied. 

The algorithm of the induction of a decision tree is recursive. For 
a decision table S = (£7, AU {d}) this algorithm may be explained as follows 
(Moczulski, 1997): 

1. If all examples remaining for classification belong to the same class Ek^ 
then create a leaf of the tree for this subset of examples and quit. 

2. Prom the set of attributes A, which have not yet been applied for parti- 
tioning the set of examples, select an attribute a and create a corresponding 
node and as many edges as many different values M(a) = card{y(a, E)} are 
taken by the condition attribute a in the set of examples E (cf. Fig. 17.4). 




Fig. 17.4. Node of a decision tree for the set of examples E 
and the attribute a taking the values t;i, . . . , VM{a) 



3. Represent the set of examples that remain for classification as a sum 

of M(a) subsets vfhere 

= {e € S I val (o(e)) = Um}) m = 1, . . . , M(a), (17.21) 

and assign individual subsets to subsequent edges that start from the node 
created previously. 

4. Create a new set A' = A \ {a}, removing the attribute a from the set 
of attributes A. 

5. Apply the described algorithm recursively to the subsets E ^^^ , , . . . , 

^(M(a)) set of remaining attributes A'. 
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The selection of an attribute that is to be used for expanding of 
decision tree is an important problem, whose solution influences the per- 
formance of classiflcation. To select an attribute, we usually use the entropy 
criterion, based on the amount of information contained in an event in which 
the discretized attribute that is interpreted as a discrete random variable 
may take one of its M(a) values. A detailed discussion of applicable criteria 
is contained in (Quinlan, 1986; 1993). One can usually use: 

• the gain criterion^ which selects an attribute a that maximizes infor- 
mation gain obtained by partitioning the set of examples into subsets corre- 
sponding to different values of this attribute; 

• the gain ratio criterion^ which enables one to select an attribute a that, 
when used for partitioning the set of examples into subsets corresponding 
to individual values of this attribute, produces a maximal relative gain of 
information contained in this partition; in the opinion of J. R. Quinlan (1993) 
this criterion is more robust than the gain criterion. 

Decision trees obtained for data which are represented by values of many 
attributes are usually more complex and overfit training data. To avoid this, 
the pruning of decision trees is applied. It consists in removing selected sub- 
trees (branches) and replacing them with terminal nodes (leaves). 

The algorithm of pruning decision trees may be explained as follows 
(Quinlan, 1993): 

1. start to prune the first leaf of the tree; 

2. test the smallest subtree that contains the leaf being analysed: 

(a) replace this subtree with a leaf; 

(b) estimate the classifier error; 

(c) if this error is acceptable, go to the parent node and if it differs 
from the root of the complete tree^, repeat the steps 2(a)-2(c) for 
a subtree whose root is this node; 

3. continue pruning (starting from the step 2) for the branch not pruned 
yet, starting from the next leaf (if it exists). 

Constructive induction of rules. Apart from selective induction described 
previously, constructive induction may also be used. Its application is con- 
nected with some intentional transformation of attribute representation space 
and the creation of new attributes. New attributes may be created in a super- 
vised way on the basis of knowledge acquired from experts, who in diagnostic 
reasoning apply composed attributes such as the simultaneous occurrence of 
a group of attributes with respective values (Giordana et al, 1993). They 
may also be obtained in an unsupervised way - automatically, as, e.g., in 

^ Arriving at the root of the tree is equivalent to substituting the complete tree by a 
single leaf, which would be trivial (Moczulski, 1997). 
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the program AQ17-HCL The obtained database usually makes it possible to 
classify new examples with high accuracy. However, it is sometimes difficult 
to interpret newly created attributes. For example, the AQ17-HCI program 
constructs attributes that are defined by means of complex expressions con- 
taining logical conditions. 

Classifiers of states. In practical applications of technical diagnostics both 
elementary states and complex states are considered. An elementary state 
is one (Moczulski, 2001a) for which it is sufficient to give a value of only 
one state attribute to fully describe it. A complex state is considered as 
the simultaneous occurrence of more than one elementary state, hence with 
some simplification one can assume that values of at least two attributes are 
required if this state is to be represented. 

The contents of the knowledge base may be applied to the classification 
of examples represented by means of values of attributes that describe the 
object to be classified. For the sake of the further description it is reasonable 
to distinguish binary classifiers and multi-class ones (Cholewa and Kazmier- 
czak, 1995). Binary classifiers allow discerning between K = 2 states, while 
multi-class ones may discern K > 2 states. 

If only elementary states are being considered and the number of class- 
es to be discerned AT > 2, a multi-class classifier is used. It may be also 
applied in the cases where complex states do occur. Then, new classes are 
to be created corresponding to each complex state. Nevertheless, such an 
approach requires supplementing new classes as the knowledge base is devel- 
oped due to taking a greater number of states into consideration. Usually new 
knowledge acquisition from the supplemented database is required. Another 
possibility would be the application of the so-called incremental learning 
(Michalski, 1983). 

To diagnose complex technical states, a family of classifiers may be ap- 
plied. Each classifier may be used for the identification of a value of one 
attribute of state. Learning by the classifier is carried out using a set of all 
examples partitioned into subsets: 

Ek= U Eki, (17.22) 

where 



Eki = {e e E \ val {Xk{e)) = Xki}, I = 1, . . . (17.23) 

are subsets of the set of all examples that satisfy the condition val(X;fe) = Xkh 
Usually a subset of positive examples of the elementary state [val(XA;) = Xki], 
1 < I <Uk and counterexamples (corresponding to a lack of the given fault, 
for which [val(Xjt) = Xki]) are considered, which is equivalent to considering 
a binary classifier. However, no interesting results of this attempt have been 
obtained in the research carried out. 

In connection with this result it has been suggested to build respective 
hierarchical classifiers (Moczulski, 1997) that allow the sequential recognition 
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of elementary states, which simultaneously occur for the given complex state. 
Such a classifier is a collection of multi-class classifiers (and possibly binary 
ones). The structure of the classifier may be represented by means of a tree^ 
which is an extension of a tree of states (as, e.g., the tree shown in Fig. 17.5). 
This exemplary tree corresponds to the case when the object is classified on 
the basis of the values of its three attributes Xi,X 2 ,Xs, which may take 
2, 3 and 4 values, respectively. A portion of knowledge (as a ruleset or a 
decision tree) is assigned to each node of the tree of states. This portion 
allows recognizing the value of the given state attribute. These classifiers are 
applied sequentially, in order determined by the structure of the tree of states 
(starting from its root). 



X2\ 




^12 



X2 



^73 



X2\ 



X2I 



X23 



^ ^ ^ 1 ^ X33 ^ ^ ^ 3^31 1 ^ 1 ^ 



Fig. 17.5. Exemplary tree of states 



Assessment of the classifier’s performance. That takes place usually 
through the application of the classifier to the classification of a set of test 
examples (the so-called wrapper approach). Several techniques of the assess- 
ment of the classifier’s performance are used. They usually consist in parti- 
tioning the set of all accessible examples E = E^UE^ into the subsets E^ of 
learning examples and E^ of test examples^ the latter not to be used in the 
learning phase of the classifier. Among several known techniques of assessing 
the classifier’s performance, both the leave-one-out and random subsampling 
ones were applied. 

What plays the basic role in comparing classifiers is their performance 
T]ov = I - Cov, where 



77 - 57 - 7 - 

card(E^) ’ 



(17.24) 



and Herr denotes the number of test examples that were misclassified. The 
quantity Cov is an estimate of the overall error rate of the classifier. Only 
classifiers with sufficiently high performance are accepted. An acceptance 
threshold r]min (0 < rjmin < 1) depends on the number of classes K to be 
recognized (e.g., for AT = 2 it is assumed that rjmin = 85 90 percent, 

while for a greater number of classes slightly smaller values of this threshold 
are accepted). For a strongly unbalanced distribution of examples among 
individual classes weighted error rates were introduced (Moczulski, 1997). 
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Apart from the overall error rate some partial error rates are used, includ- 
ing the relative error (the number of examples belonging to this class that 
were misclassified) , the omission error (when a positive example of the given 
class is wrongly classified to another class) and the commission error (when 
a negative example of the class is classified to this class) . These errors allow 
identifying causes of lowering the performance of the classifier. Similarly 
as in the case of the overall error rate, for a strongly unbalanced distribution of a 
set of examples some weighted error rates were introduced (Moczulski, 1997). 

Criteria for assessing the structure of a set of states. The tree of 
states represents diagnostic knowledge concerning the given class of objects. 
Knowledge about relationships between technical states of the given object 
may be acquired from an expert or learned from examples. Examples may be 
clustered with respect to the similarity of values of attributes, and results of 
clustering may then be presented to a domain expert in order to assign (or 
define) respective states to subsequent groups of examples. 

The field of possible solutions contains many different structures of the 
tree of states, hence there is the need for an optimal selection of a structure 
with respect to some criterion. Some criteria of the selection of an optimal 
tree follow. A separate problem is an adequate organization of search through 
the field of possible solutions, whose cardinality is subject to combinatorial 
explosion (due to the number of state attributes K and cardinalities of do- 
mains of each state attribute Uk^k = 1,...,A"). An attempt based on the 
knowledge of an expert-diagnostician is recommended here. 

A natural criterion, though of intermediate character, is the minimum 
value of the overall classification error. This error is calculated for each node 
of the tree. The overall error rate and partial errors of the complete hierar- 
chical classifier may be estimated by considering respective dependent events 
and calculating estimators of conditional probabilities. 

The minimum classification error is not the best criterion in every case. 
For example, an individualized risk of a wrong diagnostic decision may be 
used, where we would prefer structures of the tree which minimize the error 
of the recognition of some subset of the set of states (e.g., states whose occur- 
rence is connected with a significant hazard to the life and health of people 
and/or a hazard to the environment). The creation of respective optimization 
criteria may be a separate research problem as well. 

It is worth emphasizing that the optimization of the structure of a tree 
of states ought to be carried out in integration with the selection of relevant 
attributes (with respect to the given node of the tree of states) and a suitable 
selection of discretization thresholds of quantitative values (Moczulski, 2001a). 

17.4.1.3. Methods of discovering declarative knowledge 

Introductory research on the application of this group of methods was de- 
scribed in the work (Moczulski and Zytkow, 1997). The problem of the iden- 
tification of knowledge about inverted relations was addressed there as well. 
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A machine (or automated) discovery of knowledge significantly differs 
from machine learning mostly because by using knowledge discovery meth- 
ods a new portion of knowledge is discovered (Zytkow, 1996), while machine 
learning methods may be applied for the acquisition of knowledge already 
discovered, and the goal of the learning process consists only in representing 
this knowledge in the knowledge base of the given expert system. 

Discovery of regularities. Diagnostic databases containing values of at- 
tributes that describe inputs and outputs of the observed real objects may 
be sources of useful knowledge about diagnostic relationships (Moczulski and 
Zytkow, 1997). The goal of knowledge discovery is to identify regularities 
that exist in the dataset contained in the database. A regularity is defined 
by some pattern and the range within which this pattern holds (Zytkow and 
Zembowicz, 1993). Examples of patterns are: contingency tables^ equations 
and logical equivalences. The range within which the pattern holds is defined 
as a complex condition which is a conjunction of simple conditions (such as 
inequalities). Each simple condition partitions the set of values of a single at- 
tribute into two subsets, of approximately the same cardinality. Apart from 
the discovery of regularities, in the case of databases containing a time series 
a joint approach is used, whose essence consists in searching for maxima and 
minima, which is accompanied by a search for regularities represented by 
equations (Zytkow and Zembowicz, 1993). 

For the needs of diagnostic concluding the best means are equations 
(Moczulski and Zytkow, 1997): 



X = f{Y,U), (17.25) 

where X denotes values of attributes of the technical state, U - values 
of attributes of the operating conditions of the object, and Y - diagnostic 
symptoms. If such dependencies do exist, they allow a precise and unique 
prediction of the object’s states. The results of measurements and observa- 
tions contain noise, are incomplete, rough or fuzzy, hence datasets contained 
in the databases are a carrier of incomplete information about quantitative 
dependencies. To identify regularities in the database, contingency tables are 
applied first (Zytkow and Zembowicz, 1993). They are the basic means of 
representing two-dimensional regularities. Other means of knowledge repre- 
sentation, such as equations, taxonomies, rules and concepts, may be regarded 
as special cases of these tables. 

Restricting deliberations to contingency tables let us focus our atten- 
tion on two attributes a, 6 G A U {d} existing in an information system S. 
One-dimensional histograms of their values may be represented by means of 
matrices: 



hist (a) — , . . . , 77-a,iV(a)]> 

hist (6) = 



(17.26) 

(17.27) 
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where N{a),N{b) are numbers of intervals of values of the attributes a,b 
considered in each of these histograms. A 2-D contingency table for two 
attributes a,b £ AU {d} may be formally defined as the matrix product of 
the histograms hist (a), hist (6): 



Hist (a, b) 



1 

caid{E} 



hist (a)^ hist (6). 



(17.28) 



Assessment of the significance of regularity. In order to allow an auto- 
mated operation of a KDD system, it is required to evaluate the significance 
of the identified qualitative dependencies. The significance is evaluated using 
the probability Q of the event that the analysed regularity is some statisti- 
cal fiuctuation of two attributes, while both of these attributes have random 
distribution. The smaller the probability, the more significant the discovered 
regularity (Zytkow and Zembowicz, 1993). To determine the value of Q, the 
statistics is calculated according to 






Eij 



(17.29) 



where Aij are actual frequencies (from a sample), and Eij are frequencies 
expected in the case when the null hypothesis about a lack of any depen- 
dence between both attributes would be true. It means that for the whole 
population, the joint distribution of values of the attributes a,b is a product 
of marginal distributions of both these attributes: 

Eij = i = j = (17.30) 

It has been stated empirically (Zytkow and Zembowicz, 1993) that values 
of Q < 10“^ ought to be used, thus there is a very low probability that the 
discovered patterns would arise randomly. Greater values of Q force very 
numerous regularities to be discovered, while in fact only some of them are 
interesting due to a small value of the significance measure. This phenomenon 
makes the further selection of regularities more difficult and lengthens the 
operation time of the discovery system. 

To evaluate the prediction power of the given contingency table Hist (a, b) 
independently of both the number of degrees of freedom of this table and the 
number of records, among other measures Cramer’s V is used, which is 
defined as 






caid{E} • min{(AT(a) — 1), {N{b) — 1)} 



G [0, 1]. 



(17.31) 



The greater the value of F, the more unique predictions may be obtained on 
the basis of the contingency table considered. For V > 0.9 the dependency 
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represented by the given contingency table may be regarded as equivalence 
(Zytkow and Zembowicz, 1993). 

Discovery of functional dependencies. The discovery of strong qualita- 
tive dependencies between attributes in the form of contingency tables (i.e., 
obtaining sufficiently great values of the measure V) allows selecting attribute 
pairs for which there exists the greatest probability of discovering functional 
dependencies between these attributes. Moreover, it is possible to determine 
the range of this dependency, i.e., a subset of records on which the identified 
dependency likely holds. 

These equations are searched for only in a dataset in which a functionality 
relation holds, which is defined as follows: 

There is a set of pairs 



D = {{xi,yi) \xie X A i-l ,.. . , A^}, (17.32) 

where y is a, function of x iff for each xq e X there exists only 
one unique yo, such that (xo,yo) G D. 

The search for equations is performed in the following steps (Zytkow and 
Zembowicz, 1993): 

(i) generate new function terms; 

(ii) select pairs of terms (as expressions for x and y, respectively); 

(iii) generate and test equations for each pair of terms. 

In the research carried out in the author’s department the BACON-3 
methodology of Langley et al (1987) is used. This attempt consists in a step- 
by-step generalization of simple equations. The discovery process is carried 
out in an automatic way. To outline the procedure, let us consider a set 

M - {Mk = K>1, (17.33) 

of structures of parametric models that the KDD system matches to 
the data. Each structure may be represented by the unique equation 
y — ip{x; ai, . . . , aq), where a{ denote variable parameters of the model and 
q is the order of the model. Elements of the set M might be, e.g., 

• linear function y = ai + a 2 X (let us notice that this model structure 
includes also the constant function y = ai); 

• square function y = ai + a 2 X -|- 

• logarithmic function y = ai + a 2 In x. 

Let {{xn, 2/n, cTn) \ n = 1, . • . , N} be a dataset of values of two attributes, 
the first of them {x) being a control (independent) attribute, while the second 
one (y) being a dependent attribute, and let individual values of the standard 
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deviation a be known for each value of the attribute y. Such an attempt cor- 
responds to the formulation of a standard regression analysis problem (Volk, 
1996). To evaluate the degree of fit of the i-th model structure, the following 
version of the f^st (Zytkow and Zembowicz, 1993) may be applied: 



^2 _ ^ f Vn — . . . ,aq.)^ ^ 

n=l 



(17.34) 



where y = ,aq) is the mathematical model of a pattern, while 

ai , . . . , Qg are selected in such a way that the value is minimized. 

Let us assume that the set of examples E has been decomposed into 
L > 2 slices: 

E = EiUE2\J-’UEl, (17.35) 

where each slice Ei, / = 1, . . . , L is the subset of examples from the database 
that correspond to some subset Zi of the domain of the attribute z: 



L 

El = {e e E\e = {x,y,z, . . .) A z e Zi}, where Zi = Dom {z). (17.36) 

1=1 



To explain the essence of the method let us assume that in all analysed 
slices El of the database E respective contingency tables allow identifying 
regularities of functional character that are sufficiently strong^ and that values 
of the attributes x^y in each slice satisfy the functionality test (17.25). 

Let us further consider a single slice Ei of the database E and the set 
of structures of parametric models M according to (17.33). For this slice it 
is possible to match to data the individual models Mk and to evaluate the 
degree of fit according to (17.34). The results of such an analysis for the l-th 
slice may be stored in the following structure: 

{{k,ak,ix\)\k = l,...,K}, (17.37) 

where ak = (o;i,fc, • • • j ^qu.k) is a vector of parameters of the A;-th model. 

The generalization of functional dependencies (equations) consists in in- 
troducing a new independent variable into the set of equations of the common 
and alike structure y = ^{x). The following stages may be distinguished in 
this approach (Moczulski, 2001b): 

1. Find a structure of the model (in the following denoted by ^ko) which 
features the greatest value of the degree of fit among all the slices of the 
database; the following criterion of the best fit may, for example, be used: 

L 

^ko = where 

1=1 



(17.38) 
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2. For the fco-th model structure Mko^ the matrix of values of model 
parameters fitted to data in individual slices is obtained: 



K’l.. = [»f 



(I) „(» 

fco’ 



,a 



(0 
Qko ,ko 



- 1=1,... J 



(17.39) 



while the s-th column (s = 1,. . . ^qko) may be considered as new data, for 
which functional dependencies that describe these data may be searched for: 

(17.40) 

where denotes the standard deviation of the parameter For 

each parameter existing in the common model structure, functional depen- 
dencies may be discovered using these data in an identical way. As previously, 
the BACONS methodology is applied. 

3. The generalized equation may be finally written down as 

y = $[x,z) = (17.41) 

where denotes a function that fits to data represented by the expression 

(17.40); this function describes values of the s-th parameter of the model Muq- 

The process of generalization described above may be continued until all 
possible attributes are included in the generalized equation. Values of the 
standard deviation may be estimated using commonly known for- 

mulae concerning the standard deviation of model parameters (Volk, 1996). 

It is worth emphasizing that the BACONS methodology allows discov- 
ering some particular class of equations, although it is possible to consider 
many different model structures during the discovery process. An important 
advantage of the method is the possibility to automate the complete discov- 
ery process, which has been achieved in the system Forty-Niner with the 
module Equation Finder (Zytkow and Zembowicz, 1993). However, the au- 
tomation requires reducing to minimum the necessity for a human operator’s 
intervention, which in this case considers only a proper selection of values of 
parameters that control the complete processing. These parameters include, 
among other things, a minimal number of records in the slice of the database 
and critical values for different tests used for assessing the significance of the 
discovered quantitative and qualitative dependencies. To select these values 
properly, extensive knowledge and experience in the field of data analysis 
are required. 

The attempt described above may be applied to both kinds of func- 
tional dependencies that occur in technical diagnostics: forward (causal- 
consecutive) dependencies and inverse (the so-called diagnostic) ones. In the 
latter case two different attempts are possible (Moczulski and Zytkow, 1997): 

(i) switching the roles between the independent and dependent attributes, 
and then a direct discovery of inverse dependencies; 
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(ii) forward dependencies are discovered first, and then an attempt to in- 
vert them is undertaken. The following methods of solving the corresponding 
system of nonlinear equations may be used: 

(a) solution of this system of equations in a symbolic way, using proper 
software, such as MatLab (Moczulski and Wachla, 2001); 

(b) solution of the obtained overdetermined system of equations using SVD 
{Singular Value Decomposition) (Wachla, 2001). 

(c) solution of this set of equations using a genetic algorithm (Wachla, 

2002 ). 

17.4.1.4. Methods of assessing declarative knowledge 

There are many methods of assessing declarative knowledge, which may be 
applied for: 

1. Assessment in detail - if a single portion of knowledge is being assessed 
(e.g., a single rule) and relations between the portion to be assessed and the 
other ones is not taken into consideration; 

2. Integral assessment - if a complete knowledge base is subject to assess- 
ment, while such an assessment may be carried out: 

(a) in relation to the content - if the content of the knowledge base is 
assessed, which may be carried out: 

• by experts - then it becomes a very difficult task since the expert 
must simultaneously take into account the whole knowledge base, 

• automatically - by means of test examples that allow assessing the 
completeness of the knowledge base, 

(b) taking into account the formal correctness - if the whole knowledge base 
is subject to assessment, but the goal consists in identifying contradic- 
tions, absorption or occurrence of loops; such an examination is carried 
out with the use of proper software. 

17.4.2. Methodology of knowledge acquisition from databases 
using machine learning 

On the basis of several years’ experience, the author developed a methodology 
of knowledge acquisition with the use of machine learning (Fig. 17.6). A more 
exhaustive description of the methodology can be found in (Moczulski, 1999). 
In what follows some key stages of this methodology are discussed. The way 
of proceeding is the recursive one. 

Since data may come from either measuring experiments (active or pas- 
sive) or numerical ones, a proper preparation of the experiment plan plays a 
very important role. An appropriate plan of the experiment makes it possible 
to obtain a correct distribution of examples in feature space, which basically 
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Fig. 17.6. Methodology of acquiring declarative knowledge 
with the use of machine learning 



influences the possibility to obtain high classifler performance. An interesting 
solution, which may be applied in the case of numerical experiments, is the 
use of a random distribution of examples in feature space. The way of defln- 
ing the classes of state should be taken into account when it is to be decided 
upon the probability distribution of control attributes (Kostka, 2001). 

Due to the need to accomplish the selection of attributes it is recommend- 
ed that the acquired signals be stored, which makes multiple attempts at 
selecting attributes possible. Particular attention should be paid to a proper 
selection of measuring quantities, so that considerably high values of signal- 
to-noise ratio would be obtained. The selection of values of attributes that 
are symptoms of the fault being considered ought to be carried out on the 
basis of general diagnostic knowledge about common causal-consecutive rela- 
tionships that concern the object in question. It is advantageous for further 
stages to select attributes which are relevant for the considered set of tech- 
nical states (Ciupke, 2001). This selection may be a subject of optimization 
(Moczulski, 1999). 
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Fig. 17.7. Way of representing procedural knowledge 



To acquire knowledge, inductive machine learning methods are used, in- 
cluding the induction of rules with the use of covers (algorithm A^) and the 
induction of decision trees. In the case of a complex structure of the set of 
states it is advisable to apply a hierarchical classifier. 

The assessment of the acquired knowledge is carried out with the use of 
such formal techniques as leave-one-out and k-fold (Moczulski, 1997). As a 
basic estimate the overall error rate according to (17.24) is applied. 



17.4.3. Method of acquisition of procedural knowledge 

A selected and cognitively new task is the acquisition of procedural knowl- 
edge (Wylezol, 2000). Procedural knowledge may especially consider proce- 
dures of operation (or acting), but it may also concern reasoning procedures 
and, first and foremost, procedures of concluding. A block diagram is a conve- 
nient way of representing procedural knowledge that engineers used to apply. 
The basic source of procedural knowledge are domain experts, who may take 
either direct or indirect part in the process of knowledge acquisition. In the 
latter case the experts may act as authors of publications, from which pro- 
cedural knowledge may be further acquired and represented in a procedural 
knowledge base. The essence of the developed method of the acquisition of 
procedural knowledge (Wylezol, 2000) consists in avoiding, as far as possi- 
ble, the participation of the knowledge engineer in this process. This can be 
achieved if the expert has at his/her disposal some suitable editor of proce- 
dural knowledge, which not only supports him/her during the representation 
of his/her own knowledge, but also enables the expert to become aware of 
the knowledge that is in his/her possession. 
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Wylezol (2000) proposed the top-down approach, which in this case con- 
sists in the step-by-step increasing of the minuteness of detail of the block dia- 
gram. According to the assumed method, he used a multi-layer block diagram 
for representing procedural knowledge. In this case knowledge representation 
consists in expanding individual tasks into more and more detailed subpro- 
cedures (Fig. 17.7), while each consecutive layer corresponds to the increased 
degree of the minuteness of detail of the representation of this procedure. 

17.4.4. Scenario of the process of acquiring declarative knowledge 

Methods of knowledge acquisition and methods of assessing the acquired 
knowledge described so far are the basis for practical operations that, when 
considered as a sequence of operations, are referred to as the knowledge acqui- 
sition process. The author proposed a scenario of the knowledge acquisition 
process that may help to plan and carry out this process. A model of this 
process is shown in Fig. 17.8 (Moczulski, 1997). 

In this model the following stages may be identified (Fig. 17.8): 

(i) conceptual project, 

(ii) elaboration of a prototype of the knowledge base, 

(iii) elaboration of a complete version of the knowledge base, 

(iv) transfer of the knowledge base for usage as well as the author’s surveil- 
lance and a periodic evaluation of its performance. 

This scenario makes proper planning and realization of the process of 
knowledge acquisition easier. Basic stages typical for each creative process 
may be easily found, therefore many authors, starting from Buchanan et 
al, (1983), and also including Cholewa and Pedrycz, (1987), use the ex- 
pression constructing to the totality of activities connected with building of 
knowledge base. 



17.5. Aiding means of the knowledge acquisition process 

The methods of knowledge acquisition described previously constituted the 
logical foundation for developing and putting into practice several aiding 
means of knowledge acquisition process. As the research was in progress, at- 
tention was focused on the integrating meaning of the common format of 
data and knowledge representation. This perception has been the foundation 
for the elaboration of a logical scheme EMPREL of the data and knowledge 
base (Moczulski, 1997). This common format and the base itself has made 
the integration of several tools into a knowledge acquisition system possible. 
In what follows some of these means are briefiy presented. 
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Fig. 17.8. Scenario of the process of acquiring declarative knowledge 



17.5.1. Data and knowledge base EMPREL 

The basic means of aiding the process of knowledge acquisition is the data 
and knowledge base. The following observations were taken into account while 
developing the logical scheme of this database (Moczulski, 1997): 

1. The description of objects of a given domain should take place in the so- 
called closed world, where one takes into consideration only a limited number 
of objects described by a limited number of attributes, each of them taking 
only a few values. 

2. To keep the database away from redundancy it is recommended to 
apply group properties assigned to classes of objects. 
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3. Data concerning objects may be represented by means of statements in 
which both the quantitative and qualitative values may occur. Furthermore, 
there are many applications connected with machinery operation and main- 
tenance where it is required to represent the results of measurements and 
observations. In this case the sources of data need to be described as well. 

4. It is convenient to represent declarative knowledge by means of sharp 
rules, rough rules and decision trees. Moreover, the database should contain 
definitions of values of befief in the plausibility of rules. The results of the 
evaluation of individual rules are to be stored in the base, too. 

Since conditions in premises of rules, descriptions of nodes in decision 
trees, and statements used for representing objects are made of constituting 
elements, defined in the common description of the domain, it is purposeful 
to combine both knowledge base and database in one entirety. This solution 
facilitates the updating of the contents of both bases. Then the kernel of the 
knowledge base and the database contains, among other things, the descrip- 
tion of the domain, dictionaries of concepts, a thesaurus of synonyms, etc. 
Taking these project guidelines into account it is recommended to write down 
pieces of data and knowledge into a relational database, which would be the 
central part of the system of programs running in the Local and/or Wide 
Area Network environment. 

It is worth stressing that by elaborating the logical scheme of the rela- 
tional data and knowledge base EMPREL, the uniform knowledge represen- 
tation has been applied, regardless of the knowledge source of the origin. This 
representation of knowledge makes mixed applications of methods of knowl- 
edge assessment possible (Moczulski, 1997). Apart from typical applications, 
where: 

• knowledge acquired from experts is assessed also by experts, 

• knowledge acquired from databases is assessed with the use of test 
examples, 

mixed applications are possible, when 

• knowledge acquired from experts is assessed by means of a set of test 
examples, 

• knowledge acquired from databases of examples is assessed by experts. 

A particularly interesting possibility consists in combining declarative 
knowledge acquired from different sources. The so-called incremental learn- 
ing is most frequently used. Such an attempt is especially useful if in some 
domain there is a commonly acknowledged set of general rules, whose sources 
may be, for example, accessible publications, and the knowledge base to be 
constructed has to be applied to some particular cases of the general class of 
machinery. Exemplary situations may concern a single machine that features 
specific properties, which may be connected with its foundation, the state 
of the subsoil of foundation, the history of operation and maintenance, the 
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quality of maintenance, etc. Then the general rules may be subject to special- 
ization during the process of incremental training. Additional training data 
for this specialization may be obtained as a result of the analysis of data ac- 
quired from the object under consideration (e.g. signals recorded during the 
diagnostic observation of this object), which may be further diagnosed with 
the use of an expert system whose knowledge base would contain the rules 
to be specialized. Then the new ruleset obtained in such a way will contain 
general domain knowledge tailored to the specificity of the machine to be 
diagnosed. 

17.5.2. Means of acquiring diagnostic relationships from experts 

As has been mentioned earlier, experts are the basic knowledge source that 
first of all provides the description (definition) of the application domain. In 
the reported research the basic goal was to eliminate knowledge engineer (s) 
from participation in the knowledge acquisition process. Some convenient 
means were different kinds of forms (Moczulski, 1997). They would enable the 
domain expert to operate in an unaided manner and guide him/her through 
successive stages of the process, in which he/she (unaided) makes him/herself 
aware of the knowledge that he/she possesses^ and then represents this knowl- 
edge using selected means of knowledge representation. 

Apart from activities described above, whose goal is to acquire new knowl- 
edge^ experts also take part in assessing knowledge acquired previously, both 
from other experts, and with the use of automatic methods of knowledge ac- 
quisition and discovery. The assessment of knowledge is less demanding than 
the acquisition of knowledge. The common data and knowledge representa- 
tion scheme EMPREL is an important facility that aids human experts in 
performing their task. 

The developed forms were applied to the acquisition of declarative knowl- 
edge represented by means of empirical diagnostic relationships. These rela- 
tionships may be used for concluding on the technical state of the object on 
the basis of symptoms of this state and conditions of the operation of the 
object being diagnozed (Moczulski, 1997). 

A simple means used in the described research was the paper form, 
which makes the acquisition of an individual diagnostic relationship possible 
(Moczulski, 1997). The name of this method comes from the carrier applied to 
the form. 

To allow the computer-based aiding of the process of knowledge acqui- 
sition from experts with a simultaneous elimination of the participation of 
the knowledge engineer, a specialized software application called electronic 
form has been introduced (Moczulski, 1997). The form may be used either 
to assess relationships and rules acquired previously (Fig. 17.9), or to acquire 
new diagnostic relationships (Fig. 17.10). A more comprehensive description 
of the electronic form and other examples of its application are contained in 
(Moczulski, 1997; Wylezol, 2000). 
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17.5.3. System of the acquisition of declarative knowledge 

All the means developed and implemented so far have been integrated into 
a system of the acquisition of declarative knowledge - Fig. 17.11 (Moczulski, 
1997). It is possible to identify its subsystems as follows: 

1. The subsystem of the data and knowledge bases together with appli- 
cations that serve as user interfaces and interfaces between the base and 
different subsystems within the complete system. 

2. The subsystem of aiding knowledge acquisition from domain experts and 
the assessment (by experts) of knowledge that has been previously acquired 
either from experts or from databases. 

3. The subsystem of collecting data and examples for the induction of 
declarative knowledge (they may come from experts, observations or measure- 
ments, or from numerical experiments carried out with the use of simulation 
software) . 

4. The subsystem of the induction of declarative knowledge by machine 
learning methods from databases of classified examples. 



Program conducting 
the simulation 
experiment 




Experts’ examnlesl 
Experts’ examples! 






Experts’ examples 
(in the database) 



format 

change 


- 


Induction of rules 
by covers 








format 




Induction of 


change 




decision trees 








format 




Induction of rough 


change 




classifiers 



Fig. 17.11. System of the acquisition of declarative knowledge 












17. Methods of acqusition of diagnostic knowledge 



709 



A central element of the system of the acquisition of declarative knowledge 
is a subsystem of data and knowledge bases and applications that operate on 
the contents of this base, as well as auxiliary applications. Due to the common 
format of data and knowledge representation, the base plays an integrating 
role with respect to the system under discussion. 

17.5.4. Means of acquiring procedural knowledge 

For the acquisition of procedural knowledge a specialized editor has been 
developed, which facilitates the representation of procedures by means of 
multi-layer block diagrams (Wylezol, 2000). It makes the representation of 
procedures according to the general top-down principle possible. The editing 
of procedures takes place in a graphical mode. The editor enables the user 
to gradually increase the degree of the minuteness of detail of the notation 
of the procedure. At the very beginning the most general tasks are identified 
(the top level of generality) and the sequence of carrying out these tasks is 
determined. The tasks may further be expanded into individual procedures, 
which may in turn be expanded into more and more detailed constituent tasks 
linked with edges, until elementary tasks are obtained (the bottom level). The 
edges correspond to an unconditional or conditional flow of control within the 
procedure. Each constituent task of the procedure is written down into the 
knowledge base, whose logical scheme is developed from the EMPREL scheme 
(Wylezol, 2000). 



17.6. Examples of applications 

In the remaining part of the chapter selected examples of the application of 
the elaborated methodology are presented, concerning: 

• active diagnostic experiment^ carried out at the stand called the Rotor 
Kit (Moczulski, 1997), 

• numerical experiment^ conducted for a rotor- supports unit of a labora- 
tory stand (Moczulski and Wachla, 2001; Moczulski, 2001b). 



17.6.1. Acquisition of declarative knowledge within 
the confines of the active experiment 

The experiment concerned diagnosing a state of the rotor installed in a mod- 
el of a rotating machine called the Rotor Kit. Faults such as different states 
of imbalance (rough dynamic balancing, moment imbalance, and quasi-static 
imbalance), a partial rub of the shaft against immobile parts, and overloads 
of the bearing system were considered. The system was observed within the 
framework of the active diagnostic experiment, where both the elementary 
and complex technical states were evoked, the latter being combinations of 
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Table 17.2. Numbers of observations carried out by different 
combinations of factors considered in the active experiment 



State of balancing 


Additional factors 


None 


Rub 


Ovid 


Rub 

+ 

Ovid 


Total 


Rough dynamic 
balancing 


13 


- 


12 


- 


25 


Moment 

imbalance 


7 


— 


— 


— 


7 


Quasi-static 

imbalance 


16 


98 


23 


9 


146 


Total 


36 


98 


35 


9 


178 



the elementary states enumerated above. The numbers of observations car- 
ried out for each evoked state are shown in Table 17.2 (the abbreviation 
Rub -h Ovid denotes a simultaneous occurrence of both the rub and the 
overload) . 

The examples are represented by means of vectors of attribute values that 
describe orbits of the shaft motion against immovable support and discrete 
spectra of relative vibrations of selected points of the object. Values of quan- 
titative attributes were subject to discretization. For the collected set of data 
interpreted as learning examples decision trees were induced with the use of 
the C4-5 program (Quinlan, 1993). To assess the classifier, random subsam- 
pling was applied with the rate of learning to testing examples amounting 
to 70 : 30. Unfortunately, a very poor performance (58 percent) of the clas- 
sifier was achieved. Hence, a hypothesis was put forward (Moczulski, 1997) 
that the too complex structure of the set of technical states evoked during 
the active experiment was in fact the reason for this poor performance. To 
improve the performance of classifiers, different degrees of detail of diagnoses 
were introduced and the top-down approach was applied in order to gradually 
increase the minuteness of detail of the diagnosis. 

The applied optimization algorithm of the structure of the tree of states 
resembles the algorithm of constructing the optimal decision tree. Three aux- 
iliary decision attributes (that are elementary attributes of state) were intro- 
duced: 

• state of balancing with the values: rough dynamic balancing, moment 
imbalance, quasi- static imbalance', 

• overload of bearings with the values: occurs, does not occur, 

• rub with the values: occurs, does not occur. 
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Fig. 17.12. Decomposition of the set of examples for the active experiment 
(descriptions in the text) 



The structure of the tree of states was optimized with respect to the cri- 
terion of the maximum performance of the classifier. It is easy to make sure 
that due to the incomplete plan of the experiment not all trees are complete, 
hence the field of possible solutions includes only seven trees. The search was 
aided by the author’s diagnostic knowledge. The optimal structure of the tree 
along with estimates of the error rate of the classifier (written down into rect- 
angles corresponding to individual states located in the nodes of the thee) 
are shown in Fig. 17.12. A significant improvement in the classifier’s per- 
formance was obtained, especially for selected subtrees (such as the subtree 
starting from the node corresponding to the state of quasi-static imbalance - 
cf. Fig. 17.12). 

17.6.2. Acquisition of declarative knowledge within the framework 
of the numerical experiment 

The example concerns the application of the described methodology to the 
acquisition of knowledge from the database containing the results of the nu- 
merical experiment. The description is based on the publications (Moczulski 
and Wachla, 2001; Moczulski, 2001b). The experiment concerned the identi- 
fication of different kinds of imbalance of a rotor supported in two hydrody- 
namic journal bearings. Only elementary states of imbalance were considered, 
playing the role of the technical state of the object. For each class of imbal- 
ance a similar number of examples was generated according to the plan of 
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the experiment, in which values of control attributes were changed in a very 
systematic way^. The most important features of the database are shown in 
Table 17.3. 

Table 17.3. Features of the database obtained in the numerical experiment 



Property 


Number 


Control attributes: 


7 


- Attributes of operating conditions 


1 


- Attributes of imbalance distribution 


6 


Dependent attributes 


16 


Decision attributes 


1 


Records 


5076 


Classes 


5 



The problem being discussed includes tasks of fault detection and fault 
diagnosis^ formulated as follows: 

• fault detection consisted in stating whether imbalance is significant and 
of what kind it is] 

• fault diagnosis aimed at the identification of imbalance distribution 
along the shaft of the modeled object. 

Since the range of the variation of control attributes was limited in order to 
make the application of a linear model to kinetostatic calculations possible 
(Cholewa and Kicihski, 1997), very regular (elliptic) orbits of the motion of 
the shaft centerline (both the absolute motion and relative motion against the 
corresponding bearing bushing of the modeled machine) were encountered. 

17.6.2.1. Acquisition of knowledge for aiding the detection of imbalance 

In the case being considered the detection of imbalance is possible by typical 
classification, which may be of hierarchical character: 

• a binary classifier may be used for stating whether the object is usable 
or not, i.e., whether imbalance is not excessive; 

• a multi- class classifier may then be used to state which of the possible 
types of imbalance is encountered. 

In the described case it was decided that knowledge would be acquired in 
the form of decision trees, which allow a single-step classification of all five 

^ On the website http://www.polsl.gliwice.pl/~moczulsk/spie_challenge a more 
comprehensive description of this experiment can be found. 
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distinguished states of imbalance. To induce the decision trees, the C^.5 pro- 
gram (Quinlan, 1993) was applied. The performance of the learned classifier 
approached 94.3 percent (Table 17.4). 



Table 17.4. Summary of the results of estimating the classifier’s 
performance for data generated in the numerical experiment 



Name of the technical state 


Number 
of test 
examples 


Correctly 

classified 

[%] 


roughly dynamically balanced 


972 


93.1 


static imbalance 


876 


98.9 


quasi-static imbalance 


876 


92.5 


moment imbalance 


875 


99.2 


dynamic imbalance 


970 


88.5 


Total 


4569 


94.3 



17.6.2.2. Discovery of knowledge for aiding the diagnosis of imbalance 

The second task was far more difficult to solve. To make the most exact 
predictions of imbalance distribution along the shaft possible, it was decided 
to acquire knowledge in the form of functional dependencies (equations). A 
more detailed description of the proceeding and the obtained results can be 
found in the work (Moczulski and Wachla, 2001). In what follows only the 
most important issues concerning the problem solution are addressed. 

Prom Table 17.3 it follows that attributes contained in the source database 
may be considered as the following variables: 

• the scalar value u representing the operating conditions of the object, 

• the matrix x = [xij] of parameters of the technical state of the ob- 
ject (here of imbalance distribution), where i = 1,2 identifies a particular 
imbalance, and j = 1,2,3 determines its magnitude and location (both the 
angular location and location along the shaft), and 

• the matrix y = [yki] of values of symptoms of the technical state of this 
object, where fc = 1, . . . , 4 identifies the plane in which the orbit of the shaft 
motion is observed, while / = 1, . . . , 4 refers to one of four attributes that com- 
pletely represent each ellipse of vibrations of the shaft in its measuring plane. 

Taking into account the general notation introduced above one may write 
down the causal-consecutive dependency as 



y = F{u,x) 



(17.42) 
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and the corresponding diagnostic dependency (if it does exist) as 

x = G(u,y), (17.43) 

where F: 7V -)■ G: -> TV are some mappings (usually nonlinear). 

The resulting systems of equations that correspond to the matrix equations 
(17.42), (17.43) are underdetermined and over determined, respectively. These 
systems may then be solved if some attributes are fixed. 

Causal- consecutive equations. In the discussed case the BACON-3 me- 
thodology was applied in order to discover causal-consecutive dependencies 
according to (17.42). Very simple model structures according to (17.33) were 
considered, such as linear and square ones. Nevertheless, a relatively small 
prediction error rate was achieved for the analysed subsets of records of this 
database. However, the application of even such simple model structures may 
lead to relatively complex nonlinear dependencies in the case of the described 
methodology. Let us notice that prediction error rates obtained for an ex- 
emplary equation (Table 17.5) do not exceed 6.3 percent, hence they are 
relatively small. 

Table 17.5. Prediction error rate for an exemplary causal- consecutive equation 



Xll 


Xl2 


X21 


X22 


yii 


yii 


5 [%] 


90 


0 


90 


90 


845.44 


810.26 


-4.16 


203 


0 


90 


90 


1448.83 


1453.48 


0.32 


360 


0 


90 


90 


2396.10 


2347.16 


-2.04 


90 


0 


90 


270 


801.54 


761.03 


-5.05 


203 


0 


90 


270 


1391.80 


1396.19 


0.32 


360 


0 


90 


270 


2335.36 


2278.67 


-2.43 


90 


90 


90 


0 


801.59 


751.22 


-6.28 


203 


90 


90 


0 


1391.80 


1384.81 


-0.50 


360 


90 


90 


0 


2335.36 


2265.10 


-3.01 


90 


90 


90 


180 


845.44 


800.43 


-5.32 


203 


90 


90 


180 


1448.83 


1442.04 


-0.47 


360 


90 


90 


180 


2396.09 


2333.48 


-2.61 



Diagnostic equations. In this case two different approaches were applied 
in order to obtain the inverse dependencies (17.43). 

For the analysed database a direct discovery of inverse dependencies, after 
having switched the roles between attributes, did not give interesting results. 
By the application of contingency tables it was stated that there are no 
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statistically significant regularities in this database that might be further 
represented by means of equations. Therefore, another approach was used, 
which consisted in a symbolic solution of the system of equations with the 
simultaneous fixing of the value of the parameter u. Selected results of the 
application of inverse dependency for fixed values of two attributes are shown 
in Table 17.6. The prediction error rate in some cases reached the value of 16 
percent. Nevertheless, these results still make a rough estimation of imbalance 
distribution along the shaft possible. 



Table 17.6. Prediction error rate for an exemplary inverse equation 



Variable 

2/11 2/12 


Fixed 

Xl2 X22 


3:^11 


Results 

Xii 


<5 [%] 


845.44 


215.18 


0 


90 


90 


101.89 


13.21 


1448.83 


371.78 


0 


90 


203 


210.72 


3.80 


2396.10 


617.06 


0 


90 


360 


370.69 


2.97 


801.54 


207.11 


0 


270 


90 


91.69 


1.88 


1391.80 


361.36 


0 


270 


203 


213.51 


5.18 


2335.36 


606.01 


0 


270 


360 


352.48 


-2.09 


801.59 


207.25 


90 


0 


90 


75.46 


-16.16 


1391.80 


361.36 


90 


0 


203 


188.87 


-6.96 


2335.36 


606.10 


90 


0 


360 


312.22 


-13.27 


845.44 


215.18 


90 


180 


90 


83.50 


-7.22 


1448.83 


371.78 


90 


180 


203 


194.05 


-4.41 


2396.09 


617.06 


90 


180 


360 


356.87 


-0.87 



17.7. Summary 

In the present chapter a methodology of knowledge acquisition in the domain 
of technical diagnostics has been presented. The discussed methods are used 
for the acquisition of both declarative and procedural knowledge. They are 
adapted to different knowledge sources, such as domain experts or databas- 
es. Particular attention has been paid to automatic methods of knowledge 
acquisition from examples, including both inductive machine learning meth- 
ods and methods of discovering qualitative and quantitative dependencies in 
databases. It is worth mentioning the method of knowledge acquisition in the 
case of complex states, which consists in a gradual decomposition of the set 
of examples into subsets. The decomposition is carried out with respect to 
the structure of the tree of states. Due to numerous fields of possible solu- 
tions concerning different structures of the tree of states, the selection of the 




716 



W. Moczulski 



optimal structure is carried out with respect to the criterion of the classifier’s 
maximum performance. 

Selected examples of the application of the described methods have been 
given. They are focused on solutions of practical problems of knowledge ac- 
quisition in the technical diagnostics of machinery and processes. 

The further development of the methods presented in the chapter is ex- 
pected to concern: 

• with regard to the acquisition of knowledge about complex technical states 

- determining the structure of a set of examples on the basis of experts’ 
knowledge and suitably directing the search for an optimal structure (to 
avoid combinatorial explosion); 

• with regard to the reduction of information and the selection of attributes 

- an optimal selection of quantization thresholds for the discretization of 
quantitative attributes, and the selection of attributes sensitive to individual 
faults; 

• with regard to the acquisition of knowledge about dynamic objects and 
processes - the development of new methods of knowledge representation and 
acquisition from sequences of events and observations; 

• with regard to knowledge discovery - collecting databases suitable for 
discovering static and dynamic knowledge, and developing methods of the 
discovery of static and dynamic knowledge. 
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Chapter 18 



STATE MONITORING ALGORITHMS FOR 
COMPLEX DYNAMIC SYSTEMS^ 



Jan Maciej KOSCIELNY* 



18.1. Introduction 

In the case of industrial processes (systems), diagnoses should be formulat- 
ed on-line and in real time. Such a method of diagnosing is called system 
state monitoring (Koscielny, 2001). This chapter presents state monitoring 
algorithms for complex industrial installations. 

At the beginning, problems with diagnosing complex dynamic systems 
are formulated. Taking them into account and solving them is necessary for 
the correct operation of each industrial diagnostic system. A general strategy 
of state monitoring, as well as its particular phases such as fault detection, 
isolation, identification, and the detection of a comeback to the normal state, 
is presented. State monitoring algorithms using the DTS (Dynamic Tables of 
State), the F-DTS, as well as the T-DTS method are given. They differ in the 
notation method used for diagnostic relations (the binary diagnostic matrix, 
the information system), inference rules (classical or fuzzy logic, series or 
parallel inference), as well as the method of taking into account the dynamics 
of symptoms that appear in the system. The presented examples illustrate 
the features of particular system state monitoring algorithms. 
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18.2. Practical problems 

When diagnosing complex industrial installations, there occur many prob- 
lems, including the following: 

• Different symptoms caused by the same fault occur at different mo- 
ments of time. The dynamics of the occurrence of symptoms should therefore 
be taken into account in inference algorithms. 

• The system structure changes during its operation. Particular parts of 
the installation can be switched off or on. Therefore, the set of measuring 
devices also changes. 

• Single as w^ell as multiple faults can occur. 

• The number of possible states of the system is very high especially 
when taking multiple faults into account. Therefore, skilful limiting of the 
set of the analysed states of the system is advisable. 

• Knowledge about the diagnosed system is not identical. System models 
are known for certain parts of the installation but for others, only heuristic 
dependences or limits are available. Moreover, knowledge usually tends to 
grow deeper during the operation of the diagnostic system. State monitoring 
systems should therefore allow integrating different fault detection methods 
with one isolation algorithm. The possibility of an easy expansion of the 
diagnostic system also ought to exist during the operation of the diagnosed 
system, when knowledge about it becomes deeper and deeper. 

• In the case of some systems, diagnosing time may have to be limited. 
Diagnoses should be sufficiently quick in order to ensure the possibility of 
undertaking effective preventive actions in states with faults. 

18.2.1. Dynamics of the occurrence of symptoms 

The diagnosed system is a dynamic one; therefore a certain amount of time 
elapses between the occurrence a fault and a measurable symptom of its 
apperance. The time depends, among other things, on the dynamic properties 
of the tested part of the system. The same fault is detected by different tests 
that control this fault after non-identical intervals of time. Therefore, if a set 
of diagnostic signals that detect a particular fault is considered, only some 
of the signals have values being the fault symptoms at a certain moment of 
time after the fault occurs. Only after a longer time interval do all of the 
signals have values which testify the occurrence of the fault. In this situation, 
we do not take into consideration all problems concerning the uncertainty of 
the occurence of symptoms. 

False diagnoses can be generated if one does not take into account the 
dynamics of the occurence of symptoms. The problem is illustrated with the 
help of an example. Let us consider a binary diagnostic matrix (Table 18.1) 
created for four tests shown in Table 3.10. 

Let us assume that the fault fs appeared. The fault is detected by diagnostic 
signals so^ 53 , and S 4 . Let us assume (ignoring physical realities in this 
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Table 18 . 1 . Binary diagnostic matrix for the three-tank set 
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case) that the time moments of the occurrence of symptoms are different 
for particular diagnostic signals, and equal 2, 4, and 6 time units (seconds), 
respectively. The inference process will be as follows: 

a) Within the 0-2 time interval, no fault is detected, and the diagnosis 
cannot be generated. 

b) Within the 2-4 time interval, the set of the obtained diagnostic signals 
has the form S — {0,1, 0,0}. The generated diagnosis DGN 2 -A — {/ 12 } 
indicates a leak from the first tank. 

c) Within the 4-6 time interval, the set of the obtained diagnostic sig- 
nals has the form S = (0, 1,1,0}. The generated diagnosis — 

{ 1 / 2 } 5 1 / 9 }} indicates a fault of the level Li measurement line, or partial 
clogging of the channel between the tanks 1 and 2. 

d) When 6 seconds elapse, the set of the obtained diagnostic signals has 
the form S — {0, 1, 1, 1}. The generated diagnosis DGNqj^ — {/a} indicates 
a fault of the level L 2 measurement line. 

Only the last diagnosis is final and accurate. Those obtained earlier were 
false. They were formulated too early, before the diagnostic signal values 
settled after the occurrence of the fault. In order to avoid false diagnoses 
during the system state monitoring process, one should take into account the 
dynamics of the occurrence of symptoms. Time moments of the occurrence 
of symptoms depend on: 

• the dynamic properties of the tested part of the system (the lower the 
delays and the order of the controlled part of the system, the shorter the 
duration of the symptom), 

• the character of faults changes in time (incipient abrupt faults exert 
a different influence than slowly growing faults, e.g., corresponding with the 
wear of elements), 

• limit values (the lower the range of the residual value range, the shorter 
the detection time but the higher the possibility of false alarms), 

• the method of diagnostic signal evaluation (e.g., the threshold or fuzzy 
evaluation of residuals; instantaneous evaluation, or evaluation in a moving 
window) , 

• the system point of operation at the time of a fault if alarm limits 
control is applied (e.g., a leak from a tank is detected earlier if the level of 
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liquid in the tank before the fault occurred was closer to the lower alarm 
limit than to the higher one), 

• the method of the software realisation of tests. 

The times of the occurrence of symptoms depend therefore on the dy- 
namic properties of the tested part of the system, as well as on the method 
applied and the detection algorithm parameters. The times can be calculated 
in an analytical way on the basis of the knowledge of the notation of the 
dynamics (e.g., transmittances) of the controlled part of the system (whose 
input is a fault, and whose output is the measured signal), as well as the 
time characteristics of the occurrece of a fault. It is assumed that the lim- 
it function parameters are known, and the effect of the method of the test 
being implemented by the computer can be neglected. However, such calcu- 
lations are very difficult since they require modelling the effect of faults on 
the measured outputs. The accuracy of the evaluation of these times in an 
analytical way is rather low due to the inaccuracy of models as well as the 
lack of knowledge about the dynamics of the appearance of faults. 

The times of the occurrence of symptoms can be evaluated in practice 
on the basis of the knowledge of the diagnosed system as well as detection 
algorithms by giving their minimum and maximum values: denotes the 

minimum time interval between the occurrence of the k-th fault and the 
occurrence of the j-th. symptom, and 6‘lj denotes the maximum time interval 
between the occurrence of the k-th fault and that of the j-th symptom. The 
above parameters can be expressed in seconds or dimensionless units that 
correspond to the multiple of the process variable minimum sampling period. 
The two parameters Ol- and Ol-, which correspond to the dynamics of the 
occurrence of symptoms, can be assigned to each arranged pair of a fault- 
diagnostic signal ifkiSj) that fulfils the relation Rfs- 

In order to avoid false diagnoses during system state monitoring, they 
should be formulated only after the symptom values have settled. Therefore 
it is sufficient to take into account only the maximum time intervals of 
the occurrence of symptoms. The problem can be simplified even further by 
attributing a cumulative symptom time tj to each one of the tests (diagnostic 
signals). The cumulative symptom time is defined as the maximum time 
interval between the appearance of any fault checked by a given test and the 
moment the symptom is detected by the test: 

6j — max (IS.l) 

^ k:heF{sj)^ 

where F{sj) denotes the set of faults detected by the diagnostic signal Sj. 

Such a simplified method of describing the diagnosed system’s dynamic 
properties has been used in the DTS method, discussed in (Koscielny, 1991; 
1993; 1995a; 1995b), as well as in the F-DTS method (Koscielny et a/., 1999; 
Sqdziak, 2001). 

The order of the occurrence of symptoms is vital information, worth dis- 
cussing in the process of diagnosing, since different orders of the apperance of 
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symptoms can characterise indistinguishable faults (that have identical fault 
signatures). Taking into account the dynamics of the occurrence of symptoms 
more completely allows us to increase fault distinguishability and, in many 
cases, also to shorten the time of diagnosing. A T-DTS monitoring algorithm 
that takes into account the dynamics of the appearance of symptoms was 
presented in (Koscielny and Zakroczymski, 2000; 2001). The algorithm uses 
minimum and maximum symptom time values for particular faults. 

18.2.2. Variation of the diagnosed system's structure 
and the set of measuring devices 

The set of available process variables (measured signals) X, i.e., the set of 
available diagnostic signals 5, often changes during system state monitoring. 
Diagnostic signals that detected previously recognised faults become also 
temporarily useless. They cannot be applied until the state of efficiency has 
been restored. The set of diagnostic signals (tests) available at the moment 
of monitoring is a subset of the set of signals generated by all of the detection 
algorithms. Changes of the operational structure of the system (e.g., tempo- 
rary switching-off of some technical devices) also cause changes of the set of 
faults F that should be recognised. 

It is therefore impossible to define only once the set of indistinguish- 
able elementary blocks or invariable rules of diagnostic inference. They often 
change during the operation of the system. One should therefore run all of 
the diagnostic inference operations in real time, taking into account changes 
of sets of process variables X, diagnostic signals 5, and faults F. 

18.2.3. Diagnosing time limit 

System state monitoring is usually carried out in order to undertake ap- 
propriate protecting actions in recognised states with faults. Therefore, it is 
necessary to have information about the maximum possible time ru from 
the time of the occurrence of a given fault, during which it is possible to 
undertake an effective action that protects the system (process), because the 
time determines the admissible time interval for developing the diagnosis. In 
order to limit the time required for diagnosing, it is therefore necessary to 
know the following function: 



Qr : F (IS. 2) 

Tk^qrifk)^ (18.3) 

If there appear in the set of tests useful for fault isolation tests that have 
symptom times higher than the time required for diagnosing, then the diag- 
nosing process is realised in two stages. At the first one, a diagnosis achievable 
with existing time limits is developed. The diagnosis is used for choosing an 
appropriate detection algorithm. At the second stage, the diagnosis is speci- 
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fled or additionally verified on the basis of the remaining unused tests, whose 
times exceed the limit. The final result is delivered to the system operators. 



18.3. General strategy of the current diagnostics 
of industrial processes 

A block diagram of the system state monitoring process is shown in Fig. 18.1. 




Fig. 18.1. Block diagram of the system state monitoring process 
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The system state monitoring process contains: 

• cyclically realised fault detection, 

• fault isolation, 

• fault identification (optional), 

• the detection of a comeback to the state of efficiency, 

• diagnosis visualisation and archivisation, 

• a modification of the set F of faults taken into account during in- 
ference, the set of available process variables X, and the set of diagnostic 
signals S possible to be applied at any moment of time. 

The strategy of the above state monitoring methods is adapted to the 
diagnosing of complex technical installations. The definition of a subsystem 
(subsystems) in which a fault appeared is formulated each time the fault 
symptoms are detected. The subsystem is defined by the subset of possible 
faults, as well as the subset of diagnostic signals necessary for fault isolation. 
These subsets are dynamically created in different diagnosing phases and 
correspond to certain subsets of the binary diagnostic matrix or the fault 
information system. Such a method of subsystem dynamic definition consid- 
erably limits the number of system states considered during inference, i.e., 
it limits the calculation costs for diagnosing in comparison with algorithms 
that take into account states of the complete system. 



18.4. Fault detection 

All of the tests belonging to the set D are realised cyclically by the computer 
with frequencies depending on the process variable sampling time. The sam- 
pling frequency of the variables is adapted to the process (system) dynamics. 
Let 

= (18.4) 

be the subset of active tests. Changes of the state activity are introduced 
after each one of the diagnoses has been developed, and possibly by external 
procedures that update the set of tests depending on the diagnosed system’s 
structure. 

The subset of available diagnostic signals Sn corresponds with the 
subset Dn‘ 

Sn = {sj:j = l,2,...,Jn}CS. (18.5) 

Active tests can detect faults that belong to the set 

Fn= U F{sj) = {fk-.k = l,2,...,Kr.]CF, (18.6) 

j-Sj^Sn 

where F{sj) is the set of faults detected by the diagnostic signal sj. 

Fault detection consists in examining diagnostic signals that belong to 
the set Sn and are results of active tests belonging to the set Dn . If the 
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results are positive, it is inferred that no fault that belongs to the set Fn 
appeared. 



18.5. Fault isolation on the assumption about single faults 

Three methods of diagnosing: DTS, F-DTS, and T-DTS will be presented. 
The name of the DTS method was introduced by (Koscielny, 1991), and 
comes from an accepted rule of inference using a dynamically defined table of 
state being a subset of the complete table of state. The F-DTS and T-DTS 
methods are expansions of the DTS method. 

The DTS method applies the binary diagnostic matrix to the description 
of the relation that exists between faults and symptoms. Inference is realised 
with the use of classical logic. Rules of parallel or series diagnosing are ap- 
plied on the assumption about single and multiple faults. Fault isolation is 
carried out with the aim of obtaining maximum possible diagnosing accuracy 
(distinguishability) with the set of diagnostic tests available at the moment 
of diagnosing, taking into account possible limits during the time interval 
necessary for developing the diagnosis. In order to avoid the possibility of 
the existence of false diagnoses, cumulative symptom times Oj are taken in- 
to account during the inference process. The cumulative symptom times are 
defined by the equation (18.1) and are assigned to each one of the tests. 

The alorithm of series diagnostic inference using the DTS method on 
the assumption of the existence of single and multiple faults was discussed in 
detail in (Koscielny, 1991; 1995a; 1995b; 2001). A variant of parallel inference 
using the DTS method was described in (Koscielny, 1993; Koscielny and 
Pieniqzek, 1994). System state monitoring in the series version and on the 
assumption of the existence of single faults will be presented below. 

The F-DTS method is an expansion of the DTS method. The basic 
modifications consist in: 

• using the information system instead of the binary diagnostic matrix 
for the description of the relation fault-symptoms, 

• using the fuzzy logic instead of the classical one for diagnostic inference. 

Thanks to these expansions, a multi- value fuzzy evaluation of residuals, as 
well as taking inference uncertainty into account, is possible. 

The F-DTS method suggested by S^dziak (1996) was later developed by 
(Koscielny, 1999; Koscielny et al, 1999). These papers present a variant of 
parallel inference. A variant of series inference was presented in the doctoral 
dissertation by S^dziak (2001). 

The T-DTS method is also an expansion of the DTS method. The basic 
modifications consist in: 

• using the information system instead of the binary diagnostic matrix 
for the description of the fault-symptoms relation. 
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• taking the symptoms-tirne relation into account during diagnostic in- 
ference in order to further increase fault distinguishability. 

The dynamics of the occurrence of symptoms is taken into account in 
the DTS and F-DTS methods only in order to provide protection against the 
generation of false diagnoses. In the T-DTS method, more accurate knowl- 
edge about the dynamics of the occurrence of symptoms is applied. Thanks 
to this, an additional increase in fault distinguishability, as well as shorten- 
ing the diagnosing time, is possible. The concept of taking into consideration 
the order of the occurrence of symptoms for fault distinguishing is sketched 
out in Chapter 3. The T-DTS method w^as presented in (Koscielny and Za- 
kroczymski, 2000; 2001; Zakroczymski, 1998). Series inference is used in the 
algorithm. 

18.5.1. DTS method 

The most important elements of the fault isolation algorithm using the DTS 
method on the assumption about the existence of single faults are presented 
below. 

Fault isolation procedure initiation. A negative value “1” of any one of 
the diagnostic signals belonging to the set Sn that was ascertained during 
cyclical fault detection at the moment ti causes the initiation of the fault 
isolation procedure. A hypothesis about the existence of a single fault is 
initially assumed. 

Definition of the subsystem in which the fault appeared. The sub- 
system is defined by the subset of possible faults as w^ell as the subset of diag- 
nostic signals necessary for fault isolation. The isolation begins with defining 
the primary set of possible faults that at the same time is the first ini- 
tial diagnosis DGN^. Ail faults F{sj) detected by the diagnostic signal Sj, 
whose result proved to be negative, should be included in that set: 

(t =. t^) A {sj = 1) DGN^ = F^ - F(sj). (18.7) 

Let us denote the powder of the set F^ by p. The following states wdth 
single faults are considered during fault isolation: 

z^ = {z{fk):fkeF^]. (18.8) 

They are considered not for the complete system but only for its part defined 
by faults belonging to the set F^ (18.7). They create the set 

= = \Z^^p. (18.9) 

It corresponds directly to the set and further inference consists in 
analysing the faults and not the system states. 
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The subset of diagnostic signals useful for the isolation of faults that 
belong to the set is created according to the following rule: 



^ {sj G S.n : n F(sj) ^ 0} . (18.10) 

The set 5^ contains all possible signals that can be used for fault isolation 
without taking into account the limit for the diagnosing time. Let us no- 
tice that the set contains also the signal which as the first one has detected 
the fault symptom. How^ever, a second analysis of its result is advisable for 
inference verification. 

Definition of a limit for the diagnosing time as well as the subset of 
diagnostic signals that fulfil the requirement. A limit for the diagnosing 
time is initially defined as the minimum time Tk (18.3) for faults that belong 
to the set of possible faults F ^ : 

== min r^. (18.11) 

k-.heF^ 

At the first stage of fault isolation, only those diagnostic signals to which 
there correspond symptom times Oj louver than the limit for the diagnosing 
time are used. A set of diagnostic signals that fulfil the requirement is created: 

5^* = {sj € 5’ : < rq . (18.12) 

If a limit for the diagnosing time is not used, then S^* = S^. 

The set of possible faults is reduced during the inference process. The 
elimination of faults that have the lowest time symptoms out of this subset 
allows us to soften the limit for the diagnosing time, and a possible extension 
of the set of diagnostic signals that can be used wuth the modified limit. The 
modification (after the r-th diagnostic signal has been analysed) is carried 
out as follows: 



F — min Tk > (18.13) 

k:heF- “ 

5"* = {sj G 5^ : Oj < r"} D (18.14) 

where F^' denotes the set of possible faults during the r-th step of diagnostic 
inference. If a limit for the diagnosing time is not used, then S'^* = S^. 

Ascertaining the order of diagnostic signal analysis. Values of diag- 
nostic signals that belong to the set are analysed in order depending on 
the times of symptoms that correspond to these signals. The requirement for 
the application the j-th. signal looks as follows: 

> Oj. 



(18.15) 
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It provides protection against the possibility of deriving a false diagnosis. 
The moments of interpreting subsequent diagnostic signals which create the 
sequence are defined as follows: 

(9^ < < ... < < ... < r, (18.16) 

where 0'^' E {6j : Sj E 5^}, and the index r denotes the order of the analysis 
of diagnostic signals that belong to the set . 

The values of particular diagnostic signals are analysed at successive 
moments of time t = 0'^ for r = 2, . . . ,p. 

Reduction of the set of possible faults and inference verification. 

Let us a,ssume that Sj is the diagnostic signal that is interpreted as the r-th 
one, DGN^ — F'^ is the subset of possible faults after r signals belonging 
to the set have been used (diagnosing during the r-th step of inference) . 

Interpreting the value of the signal sJ leads to a reduction of the set 
of possible faults, or allows us to verify the preceding diagnostic inference. 
It is possible to formulate the following requirement for a diagnostic signal’s 
ability to distinguish faults (attachment to the class Sr) at a given stage of 
the fault isolation process: 

sJ [F’-i n F(sJ) 7 ^ 0] A [F'-i n , (18.17) 

as well as the requirement for the usefulness of diagnostic signal for diagnostic 
inference verification (attachment to the class Sw) during the r-th step of 
fault isolation: 

4 eSw^ [F^~^ n F(sJ) -= 0] V n F(sp = . (18.18) 

The requirements (18.17) or (18.18) need not be directly examined with 
the analysis of successive diagnostic signals during the process of diagnosing. 
The requirements are examined in a simple way with an attempt to reduce 
possible faults. 

The reduction is carried out according to the following rule: 

s^j =0^ [F^ = - F"-^ n F(8j)] , (18.19) 

F^-^ n F(s;:)] . (18.20) 

If F’’ C F^“^, then Sj E Sr. The requirement is equivalent to the 
requirement (18.17). The fault isolation process consists in reducing the set 
of possible faults on the basis of the results of diagnostic signals analysed one 
by one according to the symptom times that correspond to the signals. The 
process operator can be informed about the actual diagnosis DGN^ = F^. 

If F"^ — F'^~~^ or F'^ ■= 0, then Sj E Sw- The requirement is equivalent 
to the requirement (18.18). The diagnostic signal is used for the verification 
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of the results of the preceding inference. Verifying diagnostic signal values 
can be foreseen a priori according to the following formula: 

n F(sJ) = 0] uj ==: 0, (18.21) 

[jP’-i n F(sJ) = F’-^] ^ v^j = 1, (18.22) 

where Vj denotes the foreseen value of the diagnostic signal 

In the case of (18.21), the requirement fi F{sj) ^ 0 has been ful- 
filled for the signal Sj at the moment F, which results from (18.10), but 
Pi jfr(^s^.) = 0 is fulfilled at the moment t = 0^'. It means that the com- 
mon elements of the sets F^ and F(sj) have been eliminated from the set 
of possible faults on the basis of previously interpreted diagnostic signals. 
Therefore, it is possible to infer a priori that the value of the signal 
should equal zero (a positive test result). In the case of (18.22), the signal 
.sJ controls all elements that belong to the subset of possible faults, 

therefore one should infer a priori that the signal value should equal one (a 
negative test result) . 

Diagnostic signals that verify the actual diagnosis are therefore redun- 
dant. They allow us to confirm the correctness of earlier diagnosing wTen 
the obtained diagnostic signal values are consistent with the values foreseen 
according to the equation (18.21) or (18.22). Inconsistency between the real 
and foreseen values denotes discrepancies existing in the diagnostic inference 
process. 

The accuracy of inference can be examined in a different way than on 
the basis of the equation (18.21) or (18.22). The fulfilment of the requirement 
F^ ~ F^'~^ confirms the correctness of x>revious diagnostic inference while 
the requirement F'^‘ = 0 testifies to the possibility of the existence of multiple 
faults, or to a false value of one or more of the diagnostic signals. 

Generation of momentary diagnoses. All diagnostic signals used in a 
given fault isolation process create the set 5\v'(r), which is defined after the 
r-th signal has been analysed as follows: 

Sw{r) = C (18.23) 

The set F^ can be interpreted as an instantaneous diagnosis defined by a 
sequence of the diagnostic signals sj, . . . , applied: 

DG7V7sj,...,sj (18.24) 

Formulating a diagnosis with a limit for the diagnosing time. After 
the r-th diagnostic signal has been applied, it is possible to examine whether 
or not all of the diagnostic signals for which symptom times are lower than 
the limit time have been applied. If 






(18.25) 
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then there still exist signals that can be interpreted before the limit time 
elapses. In such a case, the symptom that has the next higher time is inter- 
preted. However, if 

5"* = 0, (18.26) 

it means that all signal results that fulfil the requirement for the diagnosing 
time have been applied. In such a case (for r — a), an initial diagnosis is 
formulated: 

[5"* - 5u/(r) - 0] DGN{t) - {/, G F^;r - a}. (18.27) 

It contains faults that belong to the set of possible faults defined on the 
basis of the values of all diagnostic signals that have symptom time intervals 
shorter than the limit time for diagnosing. The set contains faults indis- 
tinguishable with the use of the subset of diagnostic signals S'^'* . Necessary 
protective actions are undertaken on the basis of the initial diagnosis. Further 
diagnosing that aims at specifying the diagnosis is carried out according to 
one of the following points: 

Diagnosis formulation with the aim of minimising the diagnosing 
time. Minimising the diagnosing time (for the set of signals 5^ ) with the 
simultaneous ensuring of the maximum accuracy of each diagnosis (Koscielny, 
1991; 2001) consists in looking for the first signal out of the sequence (18.16), 
whose analysis yields maximum fault distinguishability. If the requirement 
had not been fulfilled before the time limit for the initial diagnosis formulation 
elapsed, then the inference is carried out further for signals that belong to 
the set (5^-5^’*). 

Successive diagnostic signals are analysed according to the rules given 
above. Each time it is examined whether or not there still exist in the set 
(5^ — 5'^*) signals useful for distinguishing faults that belong to the set F^ . 
If the requirement (18.18) is fulfilled for each Sj G 5^ — 5^*, then the further 
reduction of the set of possible faults F'^' is not possible. Let us note that the 
variable r takes then the value r = b. The following diagnosis is formulated: 

V 5" G DGN{Td) = {fk eF^;r^ b}, (18.28) 

It contains all of the faults that belong to the set of possible faults F^ that 
has been defined on the basis of the results of the subset of diagnostic signals 
Sw{'^) Q . The obtained accuracy of the diagnosis is the highest possible 
one and the diagnosing time is the lowest possible one. 

Since all possible verifying diagnosing signals (i.e., the ones for wTich 
the symptom time is shorter than the diagnosis elaboration time) are applied 
during the inference process, then the minimum probability of false diagnoses 
that can be obtained within this time interval is also achieved. 
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Diagnosis formulation with the aim of minimising the possibility of 
false diagnoses. The minimum probability of a false diagnosis is achieved 
after the analysis of all diagnostic signals that belong to the set has been 
carried out. If not all of the signals have been applied, then the possibility 
of the further verification of diagnostic inference exists even if distinguishing 
faults is not possible. The values of the remaining diagnostic signals are inter- 
p>reted in the order of increasing symptom times. The fault isolation process 
ends after all of the diagnostic signals p that belong to the set have been 
applied: 



[S^ - Sw (r) = 0 ] DGN (A) = {/* € =p}. (18.29) 

The diagnosis contains all faults that belong to the set of possible faults 
defined on the basis of the values of all diagnostic signals that belong to 
the subset . The obtained accuracy and credibility of the diagnosis are the 
highest possible ones but the time of their development is not the shortest 
possible one. 

The formulation of the diagnosis ends the fault isolation process. A 
diagnosis formulated at the moment t — 6’^ concerns a fault detected 
at the moment which, however, had appeared even earlier. Comparing 
particular kinds of diagnoses, it is possible to ascertain that if no incon- 
sistency of diagnostic signal values appeared during the inference process, 
then DGN {A) = DGN {To), and that the time of deriving the diagnosis 
DGN {To) is shorter than or equal to the time of formulating the diagnosis 
DGN (A), while the probability of the false diagnosis DGN (A) is lower than 
or equal to the probability of the false diagnosis DGN {To)- The equality 
takes place if b — p. 

Properties of the algorithm. The above algorithm ensures correct diag- 
nosing results in the case of the appearance of single faults, as long as the 
diagnostic signal values are true, the binary diagnostic relation is defined cor- 
rectly, and the declared symptom times are not shorter than the real ones. 

One should stress, however, that the requirement for the existence of 
single faults is fulfilled not only in the case of system states with single 
faults, i.e., in states in which each next fault appears only after the state of 
the complete efficiency of the system has been restored. The requirement is 
also fulfilled when each next fault appears after the preceding fault has been 
identified, even if at a given moment more faults exist. In such a case, the 
system state change that occurs between the successive diagnoses concerns 
one fault only. Only this fault is isolated by the diagnostic system, which 
remembers the preceding state of the system. It should be stressed that the 
accuracy of the diagnosis for successive faults can be lower due to necessary 
modifications of the set of available diagnostic signals after each one of the 
diagnoses has been derived. 

Not each one of the states with two faults that appear simultaneously or 
in a short time interval between two successive diagnoses leads to inconsis- 
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tency detected during the diagnostic inference process. Diagnoses formulated 
by the DTS method on the assumption about the occurrence of single faults 
are correct despite the simultaneous occurrence of two faults, fa and if 
the subsets of diagnostic signals that are useful for their isolation are separate: 

5“n5" = 0, (18.30) 

where the subsets of diagnostic signals that are useful for isolating the faults 
fa and fb are defined as follows; 

5^-^ = {sj e Sn : F{sa) n F{sj) ^ 0} , (18.31) 

5^ = {s,- e Sn : F{sb) n F{sj) ^ 0} . (18.32) 

The requirement can be expanded for any number of faults occurring simul- 
taneously. They will be correctly isolated by the DTS method in successive 
inference processes, as long as the subsets of diagnostic signals that are useful 
for their isolation are separate. Therefore, is it advisable to design diagnostic 
software in such a way that these independent inference processes are realised 
in parallel. 

The diagnosis is correct also when simultaneous faults are indistinguish- 
able. A subset of indistinguishable faults in indicated in such a case. It con- 
tains all faults that appeared after the preceding diagnosis had been gener- 
ated. The inconsistency during the inference process appears when for the 
isolation of the fault /a, the symptom sj ~ 1 is applied that is not caused 
by the fault fa, detected at the moment but by another fault /(,, which 
does not belong to the set Fh The requirement (18.30) is not fulfilled in such 
a case. Such a situation can occur if multiple faults appear in neighbouring 
elements of the system (within one subsystem). 

18.5.2. F-DTS method 

Simple fault isolation algorithm using the F-DTS method is based on the 
following inference principles: 

Fault isolation procedure initiation. The detection of the first symptom 
s] at the moment t — as long as the symptom has the function of mem- 
bership to any negative value (let us denote the positive value by zero) higher 
than a certain fixed value M (for instance, M = 0.6) triggers off the fault 
isolation process. 

Definition of the subsystem in which the fault appeared. Initially, 
the set of possible faults F^ is defined: 

(t zzz t^) A {sj / 0) A > M) F^ =: F{sj), (18.33) 



W'here 



F{si) = [h-.rUk.Sj)^^). 



(18.34) 
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Let US denote the power of the set by p. States that are considered 
during the fault isolation process do not concern the complete system but 
only its part defined by faults that belong to the set : 

= {z{fk) :fkeF^]. (18.35) 

Let us consider the subset of states that contains states with single faults as 
well as the state of efficiency: 

^l^zoUzi:J^^z{fk) = iy \Z^=p+l. (18.36) 

In the case of inference with the use of the DTS method, the appearance 
of the first symptom eliminates the possibility of the existence of the state 
of efficiency. Because of symptom uncertainty, the state of efficiency can be 
more probable than states with single faults, therefore it is also taken into 
account. 

The set of diagnostic signals that are useful for fault isolation is as fol- 
lows: 

5' = {Sj e Sn : F^ n F{sj) / 0} . (18.37) 

The set of possible faults as well as the set of tests that are useful 
for their isolation , creates the subset of the FIS (Fault Isolation System) 
that is applied to fault isolation in a given inference process. In (Koscielny et 
a/., 1999), a slightly different algorithm for creating these sets w^as given. 

Definition of a limit for the diagnosing time as well as the subset 
of diagnostic signals that fulfil the requirement. In the case of parallel 
inference, it is assumed that all diagnostic signals that belong to the set 
are used. The limit time 

= min tu (18.38) 

k-.fueF^ 

is not modified any further. 

The set of diagnostic signals 5^ * that fulfil the requirement is calculated 
according to the following formula: 

5^* :::n: {sj I Oj <T^}. (18.39) 

Time necessary for settling the values of all diagnostic signals that belong to 
the set 5^* is defined during the next step: 

0 ^* = max {^j} < r^. (18.40) 

Formulating diagnosis with a limit for the diagnosing time. Inference 
on the basis of a comparison of consistency between the current values of 
diagnostic signals and signatures in a dynamically created subset of the FIS 
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is carried out after the time necessary for settling the diagnostic signal values 
has elapsed (at the moment t = 0^*). Different operators can be used during 
fuzzy inference. 

The diagnosis generated before the limit for the diagnosing time has 
elapsed has the shape of the set of pairs: the system state-the certainty 
coefficient of its occurrence. It shows states that have the certainty coefficient 
values higher than the accepted threshold value K\ 

DGN ir) — ^ ) A ^ } 5 (18.41) 

where denotes the firing coefficient of the rule that shows the m-th 

fault, and is defined for diagnostic signals that belong to the set 5^* (18.39). 
The initial diagnosis allows us to begin the system protection action. 

Final diagnosis formulation. The final diagnosis that has the highest pos- 
sible accuracy and credibility can be generated after the values of all signals 
that are useful for fault isolation have settled. The time is defined according 
to the formula 

0^ = max {Oj}. (18.42) 

j-sjes^ 

A more reduced subset of the FIS is considered in the subsequent inference. 
The subset is defined by system states indicated in the diagnosis (18.41), as 
well as by diagnostic signals that remain to be analysed: 

Sj e (5^ -5^*). (18.43) 



New^ certainty coefficients (firing coefficients for rules that correspond to 
them) are calculated for all states that appear in the diagnosis, taking into 
account premises that are fulfilled for all of the signals Sj E (5^ — 5^*). They 
are calculated as follows: 

• for the PROD operator: 









> n 






■SI”) 



• for the MIN operator: 



URM = MIN I , MIN Hji I , 

• for the MEAN operator: 

Mkm|5'^1+ Hi 



I^RM 



1 

1 ^ 



(18.44) 



(18.45) 



(18.46) 



The final diagnosis has the form 

DGN (A) = {{zm, iiRm) ■ [^m e DGN (r)] n [^lRm > K]} . (18.47) 
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Thanks to applying fuzzy logic, the F-DTS method allows one to take 
the uncertainty of symptoms into account. The application of the information 
system instead of the binary diagnostic matrix leads to higher fault distin- 
guishability. 

18.5.3. T-DTS method 

18.5.3.1. Expansion of the FIS definition 

The information system adapted to the needs of fault isolation and called 
the fault isolation system has been discussed in Chapter 2. Let us define an 
expanded version of the system denoted by FIS^ by 

FIS^ = {F,S,Vs,T,r,q), (18.48) 

where F, 5, Vs and r are defined by equations given in Chapter 2. 

The set of values that correspond to each one of the diagnostic signals 
can be qualified into one of the following two subsets; the subset of positive 
values Vjp, and the subset of negative values Vjn^ denoting fault symptoms: 

V Fj =VjpU Vjjsf. (18.49) 

j 

The subset of positive values is a one-elernent one. The set 

T - : * = 1, 2; k,j : r(/,, s,) C V^n] (18.50) 

is the finite set of time parameters (expressed in seconds or dimensionless 
units that correspond to multiples of the signal sampling time). 

The function q assignes two values belonging to the set of time T to 
each one of the fault-symptom pairs: 

V [r(/,, 6-,,-) C 3 [q{fk, s^) = }] . (18.51) 

Here 6lj is the minimum time interval between the appearance of the A:-th 
fault and the occurrence ot the j-th symptom, and 6lj is the maximum time 
interval between the occurrence of the A:-th fault and the occurrence of the 
j-th symptom. The time of detecting the k-th fault by the j-th diagnostic 
signal belongs therefore to the range 9kj G [Olj.Olj]. 

18.5.3.2. Fault distinguishability in the expanded FIS 

Fault distinguishability conditions in the FIS^ are an expansion of condi- 
tions that have been given for the FIS in Chapter 3. The faults 
are unconditionally indistinguishable in the FIS^ with respect to the diag- 
nostic signal Sj G S if and only if 



(18.52) 
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i.e., if the subset of the values of the function q for the faults fk and fm 
as well as for the j-th diagnostic signal is identical, Vkj = Vmj, and min- 
imum and maximum time intervals between each fault appearance and the 
occurrence of the symptom are identical. 

The faults fk,fm ^ F are conditionally indistinguishable in the FIS^ 
with respect to the diagnostic signal Sj G S' if and only if the following 
condition is fulfilled: 

7^ 0 A [^kj^^kj] ^ ® * (18.53) 

Conditional indistinguishability means that these faults are distinguish- 
able or not dependent on the obtained diagnostic signal values and on the 
moments of the appearance of fault symptoms. It can be ascertained only 
during the system state monitoring process. Only the condition (18.53) can 
be examined at the design stage. 

Let us assume that 9lj < < $1^ < 9‘^j. Therefore, the common 

part of the symptom appearance time interval equals ( 9j^j , 9 ^^ ) . If a fault 
symptom appears at a time shorter than i.e., within the time interval 
(9lj,9lnj)^ it can only be the A;-th fault’s symptom. Similarly, if a symptom 
appears after a time longer than 9\-^ it means that it was caused by the 
m-th fault. 

The fault indistinguishability condition during the current diagnosing at 
the moment of the occurrence of the symptom tj takes the shape 

'vj e n ^mj e (18-54) 

On the other hand, the faults fk^fm^F are distinguishable if the following 
condition is fulfilled: 

'^3 ^ {Ykj ^ymj) V tj G ^ ^ * (18.55) 

The faults fk,fm ^ F are unconditionally indistinguishable in the FIS^ 
with respect to every diagnostic signal sj G 5 if and only if 

V r{fk,Sj) =r{fm,Sj) A i^y = [Cj, Cj]]| : (18.56) 

which can be written as fk^Nfm- 

The faults fk,fm^F are conditionally indistinguishable in the FIS^ 
if the following condition is fulfilled: 

V r{fk,Sj) n r{fm, Sj) 7^ 0 A [^kj^^tj] ^ 7^ ® ^ 

SjES j L 

^ 3^ [r{fk,Sj) V [[Slpdlj] ^ , (18.57) 

which can be written as fkFwN fm- 
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The fault indistinguishability condition during the current diagnosis 
takes the shape 



V 

SjES 



Vj G Vkj n Vmj 






( 18 . 58 ) 



On the other hand, the faults fk, fm ^ F are distinguishable if the following 
condition is fulfilled: 



3 

Sjes 



^ ^rnj) 



V 



tj e 



tj e [e%A 



2 ■ 
mj. 



. ( 18 . 59 ) 



Any two faults are unconditionally distinguishable in the FIS^ if there exists 
a diagnostic signal for which the subsets of values that correspond to these 
faults are separate, or the symptom appearance time intervals are separate: 



3 

sj es 



r{fk,Sj)nr{fm,Sj) = 0 



V 



[K„oh] n Ks,€,] = 8 



( 18 . 60 ) 



Let US notice that fault distinguishability during the monitoring of the 
system state can be higher than distinguishability ascertained at the design 
stage. Taking into account the times of symptom appearance improves fault 
distinguishability in comparison with the FIS information system. 



18.5.3.3. Fault isolation using the T-DTS method 

The principles of fault isolation on the basis of the FIS^ taking into ac- 
count the dynamics of the occurrence of symptoms are presented below. It 
is assumed that the inference process is carried out on the assumption about 
the existence of simple faults. We applied series inference, which consists in 
generating instantaneous diagnoses that are specified as a result of the obser- 
vation of successive diagnostic signals, taking into account the times of the 
occurrence of symptoms. Limit for the diagnosing time has been neglected 
since it can be taken into account similarly as in the DTS method. 

Below only general principles of inference by the T-DTS method are 
presented. The difference in comparison with the DTS and F-DTS meth- 
ods consists is the fact that the reduction of the set of possible faults on the 
grounds of each diagnostic signal’s value contains two stages. At the first one, 
the consistency of the obtained diagnostic signal values with pattern values 
that are defined by the function r(/, s) is used. At the second one, the con- 
sistence of the time of the occurrence of the symptom with the minimum and 
maximum time interval that is defined by the function g{f,s) is examined. 

The inference is carried out as follows: 

Initiating the fault isolation procedure. A negative value of any di- 
agnostic signal that belongs to the set Sn ascertained during cyclical fault 
detection at the moment F causes the initiation of the fault isolation proce- 
dure. Initially, a hypothesis about the existence of a single fault is assumed. 
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Defining the subsystem in which a fault appeared. The primary set 
of possible faults is created at the moment t = 0 after the first symptom 
vj e that testifies to the existence of a fault has been registered. The 
minimum and maximum time intervals between the appearance of the A;-th 
fault and the occurence of the first detected symptom G Vjn will be 
denoted by and respectively. 

Initially, the set of possible faults includes faults whose appearance caus- 
es such a symptom, according to the function r{f,s): 

= vj e VjN ^ DGN*/s] = [fk : [r{fk,s]) C }, (18.61) 

w”here denotes an instantaneous diagnosis elaborated using the 

FIS without taking symptom times into account, on the condition that the 
first diagnostic signal value has been used. 

However, we should exclude from the set DGN"^ /sj faults whose ap- 
pearance must cause, according to the known symptom times, first of all 
symptoms different than the observed one: 

DGN/s] = {fk£ DGN*/s] : V (18.62) 

wdiere DGiV/sl denotes an instantaneous diagnosis elaborated using the FIS 
wuth taking symptom times into account, on the condition that the value of 
the first diagnostic signal .s] has been used. Moreover, and denote 
the minimum and maximum values of the symptom time for the diagnostic 
signal 5I , respectively. 

The set of diagnostic signals that are useful for fault isolation has the 
following shape: 

5 * = {.s,- : \r{h,Si) C Vjn] A [/, € DGN/s]]}. (18.63) 

Defining a limit for the times of the occurrence of symptoms. A 

limit for the times of the occurrence of successive symptoms that belong to 
the set 5 * (since the first symptom has been observed) are calculated as 
follows for faults shown in the primary diagnosis: 



^kj 



0 if el- - < 0, 

if 



(18.64) 



rl=e%-e\,. (18.65) 

Formulating instantaneous diagnoses and the final diagnosis. The 

formulation of successive diagnoses takes place after each subsequent symp- 
tom is registered, and each time after the maximum time interval of the 
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appearance of a symptom elapses. An instantaneous diagnosis is formulated 
in two stages. At the first one, faults for which values of the r-th analysed 
diagnostic signal 5^ are inconsistent with the pattern value in the FIS are 
eliminated from the instantaneous diagnosis generated at the preceding step: 

DGN*ls],. . . , = {h e DGN/s ^^, ; [vj = r(/,, sj] }. 

(18.66) 

At the second one, faults for which the time of the occurence of the symptom 
is inconsistent with values defined by the function g(/, s) are eliminated from 
the diagnosis formulated at the first stage: 



DGN/s] 



3 ^ ' 






(18.67) 



where DGN /.s] , . . . , ^ denotes the instantaneous diagnosis formulated 

on the basis of a sequence of r diagnostic signals sj, . . . , , sj. 

It is very easy to take into account a limit for the diagnosing time in 
the algorithm of diagnosing using the T-DTS method. It is also possible to 
design an inference variant that minimises the diagnosing time, as well as a 
variant that ensures the highest possible credibility of the diagnosis. 



18.6. Fault isolation on the assumption about multiple faults 

If inconsistency is ascertained during a diagnosis carried out on the assump- 
tion about single faults, one should assume that more faults appeared. A 
diagnosing algorithm in which subsets of states that exist in a defined part of 
the system are discussed in the order of the increasing number rn of faults, 
beginning with m = 2, was presented in (Koscielny, 1991; 1995a; 1995b). 
A simplified variant of multiple fault isolation for the DTS and the F-DTS 
method is presented below. The inference algorithm for the T-DTS method on 
the assumption about the existence of multiple faults has not been designed 
yet. 



18.6.1. DTS method 

The assumption about the existence of multiple faults is made if inconsistency 
was detected during diagnostic inference (the condition DGN — 0). 

The following reasoning is applied: 

• the values of the obtained diagnostic signals are inconsistent with the 
signatures of the existing faults due to the existence of several faults, 

• if multiple faults appeared, then all symptoms of each one of the faults 
appeared. 
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• in order to ascertain the possibility of the existence of a given fault, it 
is sufficient to check whether or not all of the symptoms that correspond to 
this fault have appeared. 

Let us assume for simplicity that a limit for the diagnosing time is not 
considered in such a case. The primary set of possible faults may not contain 
all of the existing faults. Symptoms that have led to the inconsistency of 
inference can be caused by other faults. After symptom times elapse for all 
of the signals that belong to the set 5^ (18.10), their results are examined, 
and an expanded set of possible faults is created: 

F*= U (18-68) 

j:{sjeS'^)A{sj = l) 

It contains all of the faults that are detected by the registered symptoms. 
It is not only diagnostic signals that belong to the set that are useful for 
the isolation of faults that belong to the set F*, but also other signals that 
detect these faults. The expanded set of signals useful for fault isolation is 
created according to the following formula: 

s*= [j Sifk), (18.69) 

k:fkeF^ 



where 

S{fk)^{sj:{fk.Sj)eRFs} (18.70) 

denotes the set of diagnostic signals that should take the value “1” in the case 
of the occurrence of the k-th. fault. 

Inference begins after the symptom times of all signals that belong to the 
set 5* have elapsed. The set S{fk) is created for each one of the symptoms 
belonging to the set F * , and it is checked whether all of the signals belonging 
to this set take the value “1”. If the condition is fulfilled, then the fault is 
considered as one of possible faults. The diagnosis indicates the set of all 
faults that fulfil this condition: 

DGN={heF* : V Sj = l]. (18.71) 

The set of faults defined in the diagnosis contains not only the existing faults 
but also 

• faults indistinguishable from the existing ones (wdth the set of diag- 
nostic signals used during the inference process), 

• faults fm whose subsets S{fm.) are contained in the sum of the sets 
S{fk) of the existing faults. 

If possible faults have not been indicated as a result of inference carried out 
on the assumption about multiple faults, then an assumption is made about 
symptom inconsistency due to, for instance, measurement disturbances, etc. 
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In such a case, an approximate diagnosis is formulated. All of the signals that 
belong to the set 5^* (18.12) are used for formulating the diagnosis. The 
subset of all possible faults F'^ (18.7) is considered. Inconsistency indices of 
the obtained and pattern signals are defined for faults that belong to this 
subset: 

E (18-72) 

j-.sjes^ 

The diagnosis indicates faults for which the coefficient N value is the lowest: 

DGN {N) = {/fc ; ^ mm ^^-4}. (18.73) 

The above way of inferring is applied if false results of particular diagnostic 
signals appear, and the set of all signals is inconsistent with the signatures of 
particular faults. For instance, such a situation will appear in the three-tank 
set if the diagnostic signal values are as follow^s: s\ ~1, S2 = 0, 53 — 1, 54 = 
0, 55 == 0. 

It should be noticed that in particular cases, if multiple faults appear, 
then a false diagnosis can be formulated that indicates single faults, and in- 
consistency in diagnostic inference is not ascertained. Such a diagnosis can be 
formulated if this single fault’s signature is consistent with the joint signature 
of the state with multiple faults that exist in the system. 

18.6.2. F-DTS method 

The assumption that multip>le faults exist is made if all of the states with 
single faults as well as the state of efficiency (belonging to the set Z^) have 
a low certainty coefficient (e.g., lower than K — 0.3). The diagnosis (18.46) 
is then an empty set: DGN {A) = 0. 

Reasoning in inference is similar as in the DTS method. It is assumed 
that if multiple faults have appeared, then all symptoms of each one of the 
faults should also appear (i.e., the coefficients of the membership of each sig- 
nal to the pattern value that is different than zero should be high). How^ever, 
the inconsistency of signals with pattern values that equal zero should appear 
due to the existence of several faults. 

Let us assume for simplicity that a limit for the diagnosing time is not 
considered in such a case. The primary set of possible faults may not contain 
all of the existing faults. Symptoms that have led to the inconsistency of 
inference can be caused by other faults. After symptom times elapse for all 
of the signals that belong to the set 5^, their results are examined, and an 
expanded set of possible faults is created: 

F*= U Fisj). (18.74) 

It contains all of the faults that are detected by the registered symptoms. 
It is not only diagnostic signals that belong to the set that are useful for 
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the isolation of faults that belong to the set F* , it is also other signals that 
detect these faults. The expanded set of signals useful for fault isolation is 
created according to the following formula: 

5*= U 5(/,), (18.75) 

k:fkeF- 



where 

S{fk)^{sr-r{fk,si)^0} (18.76) 

denotes the set of diagnostic signals that detect the k-th fault. 

The inference begins after the symptom times of all signals that belong 
to the set 5* elapse. The set S{fk) is created for each one of the symptoms 
belonging to the set F*, and the function of the membership of values of 
particular signals that belong to the set S{fk) to their pattern values in 
the FIS is defined. By using one of the t-norrn operators, it is possible to 
calculate the fulfilment degree for all premises. It is interpreted as the degree 
of the occurrence of a given fault (on the assumption about the existence of 
multiple faults). If the PROD operator is used, it is calculated according to 
the formula 

MA: “ n Mfcj. (18.77) 

3‘Sjes^ 

The diagnosis indicates faults for which the degree of certainty of their exis- 
tence is higher than the defined threshold value: 

DGN (A) = {{fk,m) : [h e F*] A [m, > K]]. (18.78) 

The advantage of the presented inference method is a very high reduction 
of the number of the analysed states of the system. The diagnosis about 
multiple faults is developed on the basis of a comparison of the observed 
symptoms and sets of symptoms that correspond to particular faults. 



18.7. Modification of the set of available diagnostic signals 

Each diagnosis is formulated with the use of a defined subset of diagnostic 
signals, but some of them take negative, i.e., different than zero, values. The 
signals detect faults indicated in the diagnosis, and because of that, their 
values do not change until a fault has been eliminated. The signals should 
not therefore be used in successive inference processes. 

In all of the discussed methods, a temporary suspension of the availability 
of diagnostic signals that control faults indicated in the diagnosis occurs after 
the diagnosis has been derived. The set of available signals therefore changes 
and is as follows: 



S*n+1 =Sn- {SJ e 51 : 5,- # 0}, 



(18.79) 
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where Sn denotes the set of diagnostic signals that have been used for the 
formulation of the diagnosis. 

Suspending signal availability does not break the cyclic realisation of the 
test. Such a break can take place only as a result of a change of the structure 
of the diagnosed system, which results in a lack of the possibility or usefulness 
of performing the test. The suspended diagnostic signals are still calculated, 
and if their value becomes positive, then they are included anew in the set of 
available signals. Therefore, the signals suspended in the preceding isolation 
processes, assuming that the signal values are positive, are also included in 
the set in each cycle of the modification of the set of available diagnostic 
signals after the signals have been eliminated according to the rule (18.79): 

5,„+i = U {Sj e{S- Sn) ■■ Sj = 0}. (18.80) 

A change of the set of available diagnostic signals leads to the change of the 
set of detected faults, according to the formula: 

Fn+i= U F{sj) = {fk:k = l,2,...,Kn} CF. (18.81) 

j:SjeSr, + l 

If a recognised fault is related to a measurement line, then the set of available 
process variables is also appropriately reduced. 

The recognition of several faults in a short time interval is followed by 
a significant reduction of the set of available diagnostic signals. It can cause 
a lack of the possibility of detecting other faults. The accuracy of successive 
diagnoses also becomes lower. In practice, such situations are very rare. Faults 
do not occur often, and time necessary for system elements to be repaired or 
replaced is short. 

However, there are system structure changes that have a different char- 
acter. The changes can be caused, for instance, by switching particular parts 
of the installation off or on for maintenance reasons, or by the change of the 
stock of manufactured items. Such changes must be planned during the de- 
sign of the diagnostic system. They lead to specified modifications of the sets 
of process variables, performed tests, and possible faults. Such changes are 
realised in the DTS method by a separate procedure that changes the state 
of activity (which does not depend on the state of availability) of particular 
process variables and tests, i.e., the set of detected faults. 



18.8. Fault identification 
18.8.1. DTS and T-DTS methods 

Fault identification consists in evaluating the fault size. The set of diagnostic 
signals that detect the fc-th fault S{fk) is defined by the equation (18.70) 
or (18.76). If diagnostic signals are generated on the basis of residual value 
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evaluation, then the ratio of the value of a given residual rj to its threshold 
value Kj is the elementary index of the symptom size that testifies to the 
fault size. It is possible to accept, for instance, a mean value of elementary 
indices for all of the residuals that correspond to signals belonging to the set 
S{fk) the measure of the fault size: 



V’(/fc) = 



1 

5OT 



E 

j-sjesih) 



IL 

Kj 



(18.82) 



Residual values at the moment of diagnostic inference or values averaged in 
the window having a specified length can be included in this equation. Sim- 
ilar methods of symptom size evaluation can also be introduced for other 
detection methods, and fault sizes can be evaluated according to the equa- 
tion (18.82). 



18.8.2. F-DTS Method 

Fuzzy logic is used for fault size evaluation in the F-DTS method. Val- 
ues of residuals that correspond to diagnostic signals belonging to the sets 
S{fk) (18.76) and detecting these faults are analysed for all faults indicated 
in the diagnosis. The absolute value of these residuals is subject to fuzzy 
evaluation (Fig. 18.2). 




Fig. 18 . 2 . Fuzzy evaluation of a residual’s absolute value 



It is sufficient to separate only one fuzzy set (having the linguistic inter- 
pretation “significant fault”) that has the shape of a triangle or a trapezoid as 
shown in Fig. 18.2, for each one of the residuals. The degree of the function 
lij{fk) of membership to this set is a partial measure of the fault size. One 
can accept the mean value of the membership function iij{fk) for all of the 
residuals that correspond to signals belonging to the set S{fk) a.s a fault size 
measure: 



'4>{fk) - 



1 

Wm 



j-sjes(fk) 



(18.83) 



It should be noticed that the defining of a fuzzy set for a given residual 
is carried out individually for each fault, while fuzzy sets for a given residual 
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relate during fault isolation to all faults that are detected by a given test. 
It is advisable to use knowledge about the sensibility of changes of residual 
values to particular faults during their isolation. 



18.9. Detecting a comeback to the normal state 

The current diagnosing system should not only detect and isolate the ap- 
pearing faults but also detect comebacks to the normal state. A complete 
mechanism of the recognition of fault state changes from the value z{fk) — 1 
fo ^{fk) 0 is an equivalent of the above-mentioned diagnosing algorithm. 

However, it is advisable to simplify this mechanism due to a significantly 
lower weight of diagnoses that inform about a comeback to the normal state 
in comparison with diagnoses that inform about the appearance of a fault. 

The following simple variant of the detection of a comeback to the nor- 
mal state is applied. It consists in cyclical control of subsets of diagnostic 
signals S{fk) that detect previously isolated faults fk € F(l)„,_i. If all of 
the diagnostic signals that detect a given fault take positive values, then the 
comeback to the normal state is ascertained: 

V - 0 z{fk) - 0. (18.84) 

j--sjeS{fk) 

The set of removed faults is therefore defined by the following formula: 

AF{0)n - {fk e F{l)n-1 : V V - O}. (18.85) 

^ j-sjes(fk) 

Defining the set AF(0)n means formulating a diagnosis about removing 
faults that belong to this set. The implementation of the algorithm is very 
simple. It should be noticed that removing the fault fk could be detected in 
particular cases with a delay. It happens only when the negative value of one 
of the diagnostic signals that belong to S{fk) remains due to the existence 
of another fault that is also detected by this signal. The probability of such 
cases is, however, very low. 



18.10. Defining the weight of generated diagnoses 

The w^eight of particular diagnoses is not identical. The priorities of required 
protection activities are therefore also different. Thus, the weight of all gen- 
erated diagnoses should be defined in order to make (automatically by the 
computer, or manually by the process operator) necessary decisions about 
the order of protection activities, especially if more faults appeared within a 
short time interval. 
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In order to obtain this, a weight coefficient w{fk) should be assigned 
to each one of the faults (the more dangerous the fault the higher the coef- 
ficient). The w^eight (priority) of the diagnosis is calculated using the weight 
coefficients w{fk) of particular faults shown in the diagnosis: 

Wdgn = {w{fk)}- (18.86) 

The range of changes of the weight coefficient of diagnoses Wdgn can be 
divided into separate ranges by assigning, for instance, different colours or 
other graphical distinctions to these ranges. The distinctions will be used 
during the visualisation of the diagnoses on the screens. 



18.11. Examples of state monitoring 
18.11.1. DTS method 

Let us assume that the diagnostics of the three-tank system (Chapter 1 ) is 
carried out with the use of five diagnostic signals: = { 51 , 52 ,^^ 3 ,- 54 , 55 }, 

whose algorithms are given in Table 11.3. Table 11.2 presents a list of the 
analysed faults. The binary diagnostic matrix has been expanded with addi- 
tional parameters and is shown in Table 18.2. 



Table 18.2. Binary diagnostic matrix for the three-tank set 



S/F 


h 


/2 


h 


u 


h 


/*6 


h 


fs 


h 


/lO 




/12 


/l3 


/l4 


Oi 


Si 


1 








1 


1 


1 


1 














2 


So 


1 


1 


1 












1 






1 






3 


S3 




1 


1 


1 










1 


1 






1 




3 


S4 






1 


1 












1 


1 






1 


3 


S 5 


1 


1 


1 


1 














1 


1 


1 


1 


7 


Tk 


4 


6 


6 


4 


4 


4 


4 


4 


8 


8 


8 


4 


4 


4 





The additional column presents symptom times for particular tests. The 
shorter symptom time for the first test is caused by the actuator unit’s dy- 
namic properties, which are similar to those of a proportional element. On 
the other hand, the fifth test has the highest symptom time since the test 
controls a third-order object. A row that contains admissible times for taking 
up a protective action for particular faults has been added to the table. 

Inference on the assumption about single faults 

Let us assume that the fault f^Q appeared partial clogging of the 

channel between the tanks 2 and 3. The symptom 53 = 1 has been observed 
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as the first one during the cyclical realisation of tests. The fault isolation 
process using the DTS method is carried out as follows: 

At the moment = 0 of symptom detection, the following sets are 
defined: the set of possible faults the set of diagnostic signals useful for 
fault isolation 5^, and also the limit for the diagnosing time the subset 
of diagnostic signals that fulfil this limit 5^*, and the set of the diagnostic 
signals Sw{r) used: 

=0, S3 = 1 ^ = F{s3) = {/2,/3,/4,/9,/i0,/i3}, 



S — {^2) ^3 j ^4) 

= min {6, 6, 4, 8 , 8 , 4} = 4, 

S^* = {s2,S3,«4}- 

The moments of analysing diagnostic signal values are calculated: 

= 02 = 3, = 03 = 3, 0^ = 04 = 3, 0^ = 05 = 7. 

Then, the values of particular diagnostic signals are analysed at appro- 
priate moments: 

t = 0^ — S, S 2 = 0 => — [fi, /lo, /is} - a reduction of the set 

of possible faults, 

min {4,8,4} ::^4, 5^* = {S2,S3,.S4}, Sw{2) ^ {S2}. 

t — 0^ = S, ss — 1 :=> = {f 4 ^, /lo, /is} - inference verification, 

= 5tv'(3) ==: {.$2,5s}. 

t — 0'^ =z S, 6*4 — 1 F'^ ” {/4, /lo} ~ ^ reduction of the set 

of possible faults, 

= min {4, 8} ^4, 5^^* ^ {^2,53, 64}, 5V(4) ^ {6*2,53,64}. 

At the moment (t = 3), all diagnostic signals that fulfil the requirement 
for a limit for the diagnosing time were analysed, so the initial diagnosis is 
generated: DGN{r) = {{/’i}, {/lo}}- 

It indicates tw^o faults indistinguishable on the basis of signals that be- 
long to the set 5^*. The aim of further diagnosing is to specify the initial 
diagnosis with the use of the remaining diagnostic signals that belong to the 
set S^. In this case, it is the signal 65 : 

t — 9^=^7, s^ — 0=>F^ = {/lo} cl reduction of the set of possible faults, 
Sw{5) = {62, 63, 64, 65}. 
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The following condition is fulfilled: 5^7(5) — S^. The final diagnosis that has 
the highest accuracy and the lowest probability of a false diagnosis takes the 
shape DGN{A) = {/lo}. 

In this case, it is the same as the diagnosis that has the highest accuracy 
and the lowest time of diagnosing, since in order to obtain the highest accu- 
racy, the signal that has the highest symptom period DGN{Tr>) = {fio} 
was necessary. Both of the above diagnoses are generated at the moment 
t = 7. 

Let us assume that the fault appeared - a fault of the measurement 
line of the flow F. The symptom si = 1 has been observed as the first one 
during the cyclical realisation of tests. The fault isolation process using the 
DTS method is carried out as follows: 

= 0, si = l^F^= F(si) = fr, fs}, 

-5^ = {si,S 2 ,S 5 }, r’- = min {4, 4, 4,4,4} - 4, 5'* = {si, S 2 }- 

The moments of analysing diagnostic signal values are calculated: 

£|2 = 5)^ = 2, = 02 = 3, = 6*5 = 7. 

Then, the values of particular diagnostic signals are analysed at appro- 
priate moments: 



t = 9^ =2, Si I => = F^ “ inference verification, 

r2 = r^ S'^*=S^*, 5,y(2) = {si}. 

t — 6^ = 3, S 2 = 1 F'^ = {/i} - a reduction of the set of possible faults, 

^3 _ 4^ ^3* ^ { 51 , 5 - 2 }, 5pc(3) = { 51 , 52 }. 

All diagnostic signals that fulfil the requirement for a limit for the di- 
agnosing time have now been analvsed, so the initial diagnosis is generated: 
DGN{t) = {/i}. 

It shows one fault, therefore it also is the diagnosis that has the highest 
accuracy and is formulated after the shortest time: DGN{Td) — {/i}- 

In order to generate the most credible diagnosis, it is necessary to carry 
out an additional analysis of the signal 55 , since — 5^/(3) = { 55 }: 

t — 7, v{s^) = 1 => F"^ — {fi} - inference verification, 

5u/(4) = { 51 , 52 , 55 }. 

The following condition is then fulfilled: Sw{^) — and the diagnosis takes 
the shape DGN{A) = {/i}. 
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In this case, it is the same as the diagnosis DGN{Tjd), but is was gen- 
erated after a longer time (t = 7 ) with the use of the additional signal 55, 
which confirms the possibility of the existence of the fault fi . 

Inference on the assumption about multiple faults 

Let as assume that the tw^o following faults appeared simultaneously: /n 
(partial clogging of the outlet) and /12 (a leak from the tank 1). The symp- 
tom 54 — 1 has been observed as the first one during the cyclical realisation 
of tests. Let us assume that a limit for the diagnosing time is not applied, and 
the diagnosis is formulated with the aim of obtaining the highest credibility. 
The fault isolation process using the DTS method is carried out as follows: 

= 0 , ,S 4 = 1 ^ F' = F{Si) = {/3,/4,/l0,/ll,/l4}, 

= {s2, 6-3,54,55}, S ^* = S \ 

The moments of analysing diagnostic signal values are calculated: 

0'^ = 02 = 3 , 0^ = 0-i = 3 , 0^ = 0i = 3 , = 6»5 = 7 . 

Then, the values of particular diagnostic signals are analysed at appro- 
priate moments: 

t = 0^ — 3, S2 — 1 ^ = {fs] a reduction of the set of possible faults, 

S^* = S\ 5 h^( 2 ) = {.S2}, 

t = 0^ = 3, 53 = 0 => = 0 - the detection of an inconsistency, 

Sw{S) = {s2,ss}. 

After the inconsistency has been ascertained, the inference is carried out 
on the assumption about multiple faults. It begins after symptom times for 
all of the diagnostic signals that belong to the set have elapsed, i.e., at 
the moment t — rnax{ 0 -^ : sj G 5 ^} ~ 7 . The values of diagnostic signals that 
belong to the set : S2 — 1 , = 0 , .94 = 1 , 55 = 1 are registered. The 

expanded set of possible faults is created: 

F*=F{s2) U F{s4) U F{s5) = {/l, /2, /3, /4, fd, /lO, /ll , /l.2 , /l3, fu.}, 

as well as the set of diagnostic signals that are useful for fault isolation: 
S* = {51,52,53,54,55}. 

The subsets S{fk) are defined for faults that belong to the set F^: 
S{fi) — {51,52,55}, S{f2) = {52,53,55}, S{fo) — {52,53,54,55}, 

5'(/4) = {53,54,55}, 5(/9) = {52,53}, 5 '(/io) = {53,54}, 

S{fii) = {34,35], S if 12) = [32,35], .S'(/i 3 ) - {53,55}, 

- 5 (/i 4 ) = {54,5-5}. 
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All signals take values equal to 1 only for the faults /n, /12, and fu 
out of the above subsets. They are indicated in the diagnosis DGN* ~ 
{/ii, /12, /i4}- The diagnosis showing the existence of multiple faults indi- 
cates three faults, out of which fu and /14 are indistinguishable. 

Diagnosing after the set of available diagnostic signals 
has been reduced 

If the diagnosis DGN — {/lo} has been generated, the subset of diagnostic 
signals that detect this fault is suspended. These signals are 53 and 54. The 
new set of available signals is as follows: 5^+1 ~ {51,52,55}. The modified 
diagnostic matrix is shown in Table 18 . 3 . 



Table 18.3. Modified binary diagnostic matrix for the three-tank set after 
the availability of the diagnostic signals 53 and 54 has been suspended 



S/F 


h 


h 


h 


h 


h 


h 


/7 


h 


h 


ho 


hi 


/12 


/l3 


/14 


Oj 


Si 


1 








1 


1 


1 


1 














2 


S2 


1 


1 


1 












1 






1 






3 


S3 




1 


1 


1 










1 


1 






1 




3 


S4 






1 


1 












1 


1 






1 


3 


S5 


1 


1 


1 


1 














1 


1 


1 


1 


7 


Tk 


4 


6 


6 


4 


4 


4 


4 


4 


8 


8 


8 


4 


4 


4 





Let us assume now that the new fault, /i, appears before the fault /lo 
is eliminated. The fault isolation process using the DTS method is carried 
out in this case just like in the above example. The signals S3 and 54 in 
this particular case are not included into the set of signals 5 ^ — {51, 52, 55}, 
useful for this fault isolation process. 

However, if the fault /14 (a leak from the tank 3 ) occurs after the fault 
/lo appeared but before it is eliminated, then the inference process on the 
basis of the signals Sn+i = {51,52,55} looks as follows: 

= 0 , .95 = 1 ^ F' - F{s,) = {fuh, h, h, fn,fx2, fl3, fu}, 

5”^ = {si,S2,S5}, r^=4, 5^* = {si,S2}- 

The moments of analysing diagnostic signal values are calculated: 

e^ = 0 i = 2 , = 0 i = 3 , 0 '^ = 65 = 7 . 
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Then, the values of particular diagnostic signals are analysed at appro- 
priate moments: 

t = e'^ = 2 , Si= 0 ^ F- = {/2, /s, fi, /ii, /i2, fviju} - reduction, 

t2=4, 5'* = {si,S 2}, 5,^(2) = {si}; 

t = 0 ^ = 3 , S2 = 0 => = {/4, /n, /i3, /14} ■- reduction, 

= 4, 5®* — {s].,S2}, Sw{^) = {si,6'2}. 

At the moment {t — 3 ), all diagnostic signals that fulfil the requirement 
for a limit of the diagnosing time, 5^* — Sw{^) ~ 0, have been analysed, 
so the initial diagnosis is generated: DGN{r) - {{/4}, {/ii}, {/13}, {/14}}. 
The diagnosis is also the one that has the highest accuracy and is developed 
after the shortest time since a repeated analysis of the value of the signal 55 , 
which detected the fault, can be used for inference verification only: 

DGN{Td) - {{/4}, {/ii},{/i3},{/i4}}- 

In order to derive the diagnosis DGN{A), a repeated analysis of the 
signal 55 is carried out: 

t = 6 '^ =7, S5 = 1 => = {/4, /ii , /i3, fu } ~ verification, 

<5w(4) = {si,S2,S5}- 

The following condition is fulfilled in this case: == y DGN{A) = 

{{/ 4 }> {/13}, {/l 4 }}- 

All of the diagnoses are the same and indicate four faults that are 
indistinguishable on the basis of diagnostic signals that belong to the set 
5^ — {51, 52, 55}. It is easy to notice that if the preceding fault /lo had not 
appeared, the diagnosis would be more accurate. It would have indicated the 
faults fii and /14. 

18.11.2. F-DTS method 

A case of single faults will be considered. Let us assume that the three- 
tank set monitoring is realised (as above) with the use of five tests, — 
{51, 52, 53, 54, 55}. The relation faults-symptoms is defined by the FIS^ (for 
system states z = {zq.zi, . . . shown in Table 18.4). Symptom times of 
particular diagnostic signals as well as admissible times for taking up the 
protection activity are the same as in the case of the DTS method. 

Let us assume that the fault appeared - a fault of the level mea- 
surement line of the tank 3. During the cyclical realisation of tests (with 
M 0.6), the symptom 53 = {(0,0.1), (+1,0), (-1,0.9)} has been observed 
as the first one. The fault isolation process using the F-DTS method is carried 
out as follows: 




Table 18.4. FIS^ - tbe information system of states for the three-tank set 
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{ 0 , + 1 , — 1 } 
{ 0 , 1 } 
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The set of possible faults is defined: = {/* 25 /s*, /* 45 /q) / is}- The fol- 
lowing system states wdth single faults correspond to these faults: = 

The set of diagnostic signals that are useful for fault iso- 
lation is as follows: 5^ = {^*^ 2 , ^^s, '^' 4 , -^ 55 }- 

The set of possible states and the set of tests that are useful for their 
isolation create a subset of the FIS^ . The subset is then used in the further 
inference process (Table 18.5). 

Let us calculate a limit for the diagnosing time: — min{6, 6, 4, 8, 4} — 

4, and the set of diagnostic signals that fulfil this requirement: 5^* — 
{ 52 , 53 , 54 }. During the next step, time necessary for settling the values of 
all diagnostic signals that belong to the set 5^*: 0 ^* ~ 3 is ascertained. 
Then, the subset of the FIS^ shown in Table 18.5 without, however, the 
signal 55 is considered in order to formulate the initial diagnosis. 

After the time limit has elapsed, the following values of fuzzy diagnostic 
signals that belong to the set are obtained at the moment t — 0 ^*: 

S2 = {(0,1), (+1,0), (-1,0)), 

.93 = {(0,0.1), (+1,0), (-1,0.9)}, 

.94 = {(0,0.2), (+1,0.8), (-1,0)}. 

Using the PROD operator, the coefficients of the certainty of states that 
belong to the set are calculated. They take the following values: 



[P\z 2 ) = 0 X 0.9 X 0.2 = 0, 

/U(Z 3 ) = Ox 0.9 X 0.8 = 0, 

^ 1 (^ 4 ) 3.1x0.9x0.8 = 0.72, 
fj,^ {zg) = 0 X 0.9 X 0,2 = 0, 

33 lx 0.9x0.2 = 0.18. 

The diagnosis generated before the limit for the diagnosing time has elapsed 
(with K = 0.15) is as follows: DGN{r) = {( 2 ;' 4 , 0.72), ( 2 : 13 , 0.18)}. The diag- 
nosis indicates two states but the certainty coefficient value of the state Z 4 
is considerably higher than that of the state Z 13 . 

Then, the time at which the remaining diagnostic signal values settle, 
= 7, is ascertained. At the moment t — 7, the value of the diagnostic 
signal 55 is examined. According to the FIS^ , 55 is a fuzzy two- value signal. 
Let us assume that 55 = {(0,0.1), (1, 0.9)}. Therefore, coefficients of the 
certainty of states shown in the initial diagnosis are modified as follows: 
//(^4) = 0.72 X 0.9 = 0.648 and fi{zis) = 0.18 x 0.1 = 0.018. 

The final diagnosis is as follows: DGN{r) — {( 2 : 4 , 0 . 648 )}. It correctly 
indicates the state that occurred. 
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18.11.3. T-DTS method 

In order to better present the properties of the T-DTS method, let us consider 
a set of two tanks presented in Fig. 18.3 as the diagnosed system instead of 
the three-tank set. The diagnosis is carried out with the use of analogue 
signals: U denotes the control signal, F is a flow through the control valve, 
L 2 denotes the level of liquid in the tank 2, and Lihi and Luo are binary 
signals from level sensors installed in the tank 1. Faults of the controller are 
not taken into consideration. A case of the existence of single faults will be 
analysed. 




Fig. 18.3. Diagram of the analysed system 



The set of possible faults is presented in Table 18.6, and the set of tests 
in Table 18.7. The residuals r\ and are generated on the basis of models 
of the actuator set and the two-tank set, respectively, while the residuals r2 
and T4 are based on the examination of binary signal states when level limits 
in the first tank are exceeded. Tables 18.8 and 18.9 show the parameters of 
the expanded FIS ^ . 

Case 1. Let us assume that the fault /lo (a leak from the tank 1) appeared 
at the moment t = 0. The changes of symptom values are shown in Fig. 18.4. 

The algorithm forms the diagnosis in the following steps: The symptom 54 == 
1 is detected at the moment t — 2: 

t^O, 54 - 1 , DGF76i-{/3,/4,/9,/io,/n}, 

DGF/4-{/3,/4,/9,/io,/ii}. 

The faults /lo, /n are conditionally indistinguishable. The set of symptoms 
that are useful for the isolation of faults belonging to the set DGN !s\ is 
defined: S*' — {51,52,53}. 
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Table 18.6. Set of possible faults 



F 


Fault description 


h 


fault of the level binary sensor Lilo 


h 


fault of the level binary sensor Luji 


h 


fault of the level measuring transducer L 2 


h 


fault of the flow measuring transducer F 


h 


valve blocked in the closed position or a fault of the pump 


U 


valve blocked in the open position 


h 


parametric fault of the valve or the pump 


h 


partial clogging of the channel between the tanks 1 and 2 


h 


partial clogging of the outlet from the tank 2 


fio 


leak from the tank 1 


/ii 


leak from the tank 2 



Table 18.7. Set of tests 



R 


Residual generation algorithms 


ri 


n = F - F = jF - f{U) 


V2 


r 2 = Lie I ~ 1 (signal Lihi equals 1 in the normal state) 


r-3 


rs = Lilo — 1 (signal Lilo equals 1 in the normal state) 


r'4 


ri = L 2 — L 2 = L 2 — f{F) 



Table 18.8. Function r(/, s) the binary diagnostic matrix 



F/S 


h 


/2 


h 


/4 


h 


h 


fr 


h 


h 


/lO 


hi 


Si 








1 


1 


1 


1 










S2 


1 








1 










1 


1 


S3 




1 








1 




1 


1 






S 4 






1 


1 










1 


1 


1 



The limit times for symptoms that belong to the set S* for faults indicated 
in the current diagnosis DGN/s\ are as follows: 

— 0, — 2, rg^3 = 12, Tg 3 = 24, 

^ 10,2 ~ '^ 10,2 “ '^ 11,2 ” "^7 ^ 11,2 ” 
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Table 18.9. Function q(f\ s) 




-2 -1 0 1 2 3 4 5 



Fig. 18.4. Changes of diagnostic signal values 



The time r| — 2 elapses at the moment t = 2, therefore the symptom si 
can be interpreted: 

t = 2, si=0, DGN*/slsj = {f3,f9,fio,fn}, 

DGN/slsj^ {f3,f9,fio,fn}. 

The appearance of the symptom 52 = 1 is observed at the moment t = 5. 
The inference is as follows: 

t = 5, 52-1, BGNyslslsl = {fioJn}, 



DGN/slsi,sl=fio. 
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In this case, the fault /n can be ruled out after the occurrence of the symp- 
tom 5*2 is ascertained due to this symptom time ^n.2 = 7 . Waiting for the 
symptom 53 is not necessary. The diagnosing time is minimal. 

Case 2 . The symptom 52 = 1 has been observed. The inference process is 
as follows: 



r - 0 , 52 = 1 , DGN^/sl - {/1,/5,/10,/n}, 

DGNjs\ = h. 

In this case, the occurrence of the faults /5, /lo, and fu can be ruled 
out at the very beginning since they would be preceded by other symptoms, 
according to the function q{f,s). Let us notice that the faults /lo and fn 
as well as /2 and f$ are indistinguishable in the FIS. Thanks to taking 
into account the symptom times in the T-DTS method, the faults (/2 and 
fs) are unconditionally distinguishable, and the faults (/lo and fu) are 
conditionally distinguishable . 



18 . 12 . Summary 

The fault information system has been used in the F-DTS and T-DTS meth- 
ods for the description of the relation existing between faults and symptoms. 
It provides a multi-value residual evaluation. On the other hand, diagnostic 
inference is carried out in the DTS method using the binary diagnostic ma- 
trix. Minimum and maximum values of the symptom appearance time are 
taken into account in order to increase fault distinguishability in the T-DTS 
method. In the other two methods, the dynamics of the appearance of symp- 
toms (maximum times) are taken into account only in order to avoid false 
diagnoses. The highest fault distinguishability can therefore be obtained by 
using the T-DTS method, while the lower fault distinguishability by using 
the DTS method. 

Symptom appearance uncertainties are taken into account in the F-DTS 
method only. Fuzzy logic is applied both during the evaluation of residuals 
and diagnostic inference. The generated diagnoses indicate possible faults to- 
gether with their appearance certainty coefficients. In the other two methods, 
classical logic has been used for the inference. Sufficiently high threshold val- 
ues should be used during the evaluation of residuals in the DTS and the 
T-DTS method in order to avoid false alarms. Therefore, the two methods 
are not suitable for the detection of parametric faults and those of a small 
size. The F-DTS method with proper detection algorithms is efficient in such 
cases. 

Series and parallel inference variants can be applied to all of the methods. 
Series inference has been discussed for the DTS and the T-DTS method while 
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the parallel one for the F-DTS method. The advantage of the series inference 
is that the actual diagnosis is generated during every step of the process. 

The presented methods have many advantages, including the following: 

• Universality, since they can be applied to the diagnostics of various 
industrial processes (systems) (e.g., in the power, chemical, sugar industries, 
etc.). Inference algorithms are completely independent of the diagnosed sys- 
tem. However, the set of the detection algorithms applied has to be adapted 
to the diagnosed system. 

• Adaptation to monitoring states in complex industrial installations. 
Such systems contain hundreds or even thousands of components, wdiich in- 
creases the number of possible faults as w^ell as the number of realised tests. 
All diagnostic signals do not have to be analysed in the presented methods 
in order to recognise a specified alarm situation. Dynamically defined subsets 
of possible faults as well as subsets of diagnostic signals are used in the in- 
ference process, which corresponds to the selection of a subsystem for which 
the diagnosing is carried out. It limits the cost of calculations. 

• The possibility of applying various fault detection methods depending 
on the knowledge about the diagnosed system. It is also possible to expand 
the diagnostic system in an easy way during the operation of the diagnosed 
system, while knowledge about it becomes deeper and deeper. 

• Taking into account the dynamics of the appearance of symptoms. 

• The possibility of introducing limits for the diagnosing time. 

• The ability to automatically adapt diagnostic activities in the case of 
changes of the system structure and changes of the set of available faults. 
The DTS, F-DTS and T-DTS methods define rules for choosing diagnostic 
signals that belong to the set of those which are actually available, as well as 
rules for interpreting their values. The F-DTS and T-DTS methods have been 
implemented in the toolbox TB 5.5 of the Decision Support System (DSS) 
developed in the frames of the European project CHEM (CHEM, 2002). 
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Chapter 19 



DIAGNOSTICS OF INDUSTRIAL PROCESSES 
IN DECENTRALISED STRUCTURES 



Jan Made] KOSCIELNY* 



19.1. Introduction 

The diagnostic system is most often integrated with the automatic control 
system. The structures of modern computer-based automatic control sys- 
tems that control industrial processes are decentralised and space-distributed 
(Fig. 19.1). It is advisable that diagnostic functions that constitute an integral 
part of control tasks and process protection be also realised in decentralised 
structures. 

It is therefore necessary to decompose the system into subsystems con- 
trolled and diagnosed by separate computer units during the design of the 
diagnostics of complex systems. Usually, completely independent subsystems 
cannot be separated. Due to that, symptoms of faults that occured in one 
subsystem can also be observed in other coupled subsystems. Diagnostics 
algorithms for decentralised structures must take this problem into account 
(Koscielny, 1991; 1998; 2001). 

This chapter presents a method of diagnosing industrial processes in de- 
centralised one-level and multi-level hierarchical structures. In one-level struc- 
tures, particular subsystems (technical centres) are diagnosed by computer- 
based diagnosing units that are attributed to these subsystems. The units are 
coupled by a network and can interchange pieces of data, in order to take in- 
to account symptoms observed in other subsystems. Supervisory diagnosing 
units do not exist. In hierarchical structures, computer-based units of the first 
level diagnose subsystems that are attributed to them, without taking into 
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ul. Sw. A. Boboli 8, 02-525 Warsaw, Poland, e-mail: jmkQmchtr.pw.edu.pl 
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Fig. 19.1. Model-based diagnostic system 



account symptoms that appear in the other subsystems. Supervisory units in 
the system, however, detect and isolate faults whose symptoms are observed 
independently in different subsystems of the lower level. 



19.2. Decomposition of the diagnostic system 

The diagnostics of industrial processes is composed of two stages: fault de- 
tection and fault isolation (Fig. 1.6). Different methods are used for fault 
detection (Chen and Patton, 1999; Prank, 1990; Gertler, 1998; Patton et aZ., 
1989; 2000). The most effective ones are based on the use of analytical, neural 
or fuzzy models of defined parts of the system. The detection algorithm con- 
sists of two parts. In the first one, the value of the residual rj is calculated 
using the system model and process variable values. In the second part, the 
residual value is evaluated, and the binary diagnostic signal sj is generated. 

The fault isolation system is defined by: 

• the set F of possible faults understood as destruction events that lower 
the quality of the operation of the system or its part: 

F = {fk-. k = l,2,...,K}, (19.1) 

• the set S of diagnostic signals that are outputs of detection algorithms 
realised in the system: 



^ ~ “ Ij 2, . . . , J}, 



(19.2) 
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• the diagnostic relation Rps defined on the Cartesian product of the 
sets 5 and F: 

Rfs CFxS. (19.3) 

The expression fkRpsSj means that the test dj detects the fault /a;, i.e., 
the occurence of the fault fk causes the appearance of the diagnostic signal 
Sj that has a value equal to 1 (symptom). The matrix of the relation Rps 
is a binary diagnostic matrix. Its element r{fk,sj) is defined as follows: 



r{fk,Sj) ^Vj(fk) = 



0 ifk,Sj) ^ Rfs, 

1 ^ ifk,Sj) € Rfs- 



(19.4) 



It is interpreted as the pattern value of the diagnostic signal sj when the 
symptom fk occurs, which is denoted by Vj{fk)- 

The relation Rps can be defined by attributing to each one of the tests 
a subset of faults that are detected by this test: 



= {fk ^ F : fkRpsSj}. (19.5) 

If diagnosing in a decentralised structure is to be possible, it is necessary to 
decompose the system into subsystems. Such a decomposition can be carried 
out arbitrarily or by selecting subsystems of a limited size and the highest 
possible degree of independence (Koscielny, 1991; 1998; 2001). Let us assume 
that subsystems have been arbitrarily selected that have separate subsets 
of diagnostic signals. Moreover, particular diagnostic units realise separate 
subsets of diagnostic tests: 

n 5n = 0; m ^ n, m,n = 1,2, ... ,N. (19.6) 

In general, subsets of faults detected in these subsystems are not separate: 
FmCiFriy^^; m ^ 71 , m,n = l,2,...,N, (19-7) 



where 

F„= U F{si). (19.8) 

SjESr, 



The subset of faults detected simultaneously in two subsystems, m and 
n, is defined as follows: 



Fm,n = {fkeFmnFn}. (19.9) 

The relations 

^FS = X Sn C RpS (19.10) 

are subsets of the relation Rps for the complete system. 

Each one of the subsystems is defined by the following: 

On = {Fn, Sm Rps)' 



(19.11) 
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19.3. Diagnostics in one-level structures 

Each diagnostic unit in a decentralised diagnostic structure detects and iso- 
lates faults in a defined subsystem On- The initial diagnosis DON* in a 
subsystem is defined on the assumption about the existence of single faults, 
and shows a subsystem of faults whose signatures Vn{fk) are consistent with 
the actual values Vn of diagnostic signals: 





DGN* = {fk e Fn : {Vnifk) = K) } , 


(19.12) 


where 


y„(/,) = y„^ V [vjifk) = vj]. 

3 ‘Sj G<Sn, 


(19.13) 


Here vj is the real value of the signal sj, and Vj{fk) = r{fk, 
pattern value defined by the binary diagnostic relation (19.4). 

Let us define the set of faults detected exclusively in a given 
system: 


Sj) is the 
n-th sub- 




ipW _ rp _ \ \ ip 

n — ^ m,n- 


(19.14) 




m=l 

m^n 





The diagnosis (19.12) can be presented as the sum of two subsets of faults. 
The first subset contains faults detected exclusively in a given subsystem 
while the second one contains faults detected jointly in at least two subsys- 
tems: 

DGN: = {heF^}u{fieF,n,n}, m^n, m = l,2,...,N. (19.15) 

If all faults indicated in the initial diagnosis DGN* belong to the set 
then such a diagnosis cannot be specified nor verified using diagnostic 
signals realised in other subsystems. Its character is final: 

DGN* CF^ ^ DGNn = DGN*. (19.16) 

When the initial diagnosis indicates faults detected in other subsystems: 

DGN: t F^, (19.17) 

it can be specified or verified on the grounds of diagnoses generated in other 
subsystems for which Fm,n 7^ 0- 

Modifying the diagnosis generated in the n-th subsystem using diagnos- 
tic results of the m-th subsystem can be carried out if the initial diagno- 
sis (19.12), (19.15) contains elements that belong to the set Fm,n- If the 
initial diagnosis formulated in the m-th subsystem does not contain faults 
that are indicated in the diagnosis DGN*, then - assuming that single faults 
exist - they should be ruled out of the final diagnosis of the n-th subsystem: 

DGN* n Z?GiV4 = 0 DGNn = {fk € F^}. 



(19.18) 
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When the initial diagnosis generated in the m-th subsystem does contain 
faults that are indicated by the diagnosis DGN*, one should infer that one 
of the faults indicated in both subsystems occured: 

DGN* n dgn;:^ 0 dgn^ = dgn* n dgn;:^. (19.19) 

In order to obtain the final diagnosis DGNn, the initial diagnosis should 
be modified by using the results of diagnoses formulated in all remaining 
subsystems for which Fm,n ^ 0- 

Example 19.1. The binary diagnostic matrix shown as an example in Ta- 
ble 19.1 corresponds with the centralised diagnostic structure. 



Table 19.1. Binary diagnostic matrix for the complete system 



S/F 
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h 
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/e 


fr 


/s 
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fio 


ill 


/l2 


/l3 


/l4 


/l5 


/l6 


fl7 


/l8 
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/ 2 I 


/22 


Si 
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1 










1 



Let us assume that the diagnostic system ought to have three subsystems. 
The number of tests in each one of them should not exceed six. Let us assume 
the following division of the system into three subsystems: 

• Subsystem 1: Si = {si, S 2 , Ss, «4, Sy}, = {/i, / 2 , /s, /4, /s, /e, 
fio, fn}- 
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Table 19.2. Diagnostic matrix for the subsystem 1 



S/F 


/i 


/2 


/3 


/4 


/5 


/e 


fio 


fl 2 


Si 




1 




1 










S2 






1 






1 






S3 








1 




1 






54 


1 








1 


1 






57 














1 


1 



• Subsystem 2: S 2 = {s5,S6,S8,S9,sn}, F 2 = {/s, / t, / s, / q, / ii, /is, 
/l4,/l6,/2o}- 



Table 19.3. Diagnostic matrix for the subsystem 2 



S/F 


h 


fr 


fs 


h 


/n 


/l3 


/l4 


/l6 


/20 


55 
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1 




1 
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1 
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1 


1 


1 



• Subsystem 3: S 3 = {sio, S 12 , Sis, Si4, sis, sie}, F 3 = {/n, /15, /17, /is, 

/l9,/20,/21,/22}- 



Table 19.4. Diagnostic matrix for the subsystem 3 



5/F 


/ii 


/l5 


/l7 


00 


/l9 


/20 


/2I 


/22 
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(a) Let us assume that the following initial diagnoses have been obtained 
in the subsystems: 

DGN^ = {{fu},{f2o}}, DGN; = {h,}. 

Since /20 € ^^ 2 , 3 ? and the diagnosis DGNs does not indicate this fault, then 
the final diagnoses are as follows: 

DGNi = 0, DGN 2 = {fie}, DGNs == {fir}. 

The diagnosis that concerns the complete systems is the sum of particular 
diagnosis. It equals 

DGN ^ {{hedfir}}- 

(b) Let us assume that the following initial diagnoses have been obtained 
in the subsystems: 

DGN^ = {{h},{h}}, DGN^ = {f,}, DGN; = {f2i}. 

Since /s G Fi, 2 , and the diagnoses DGNi and DGN 2 indicate this fault, 
then 

DGNi={h}, DGN2 = {f5}, DGNz = {f2i}. 

DGN = {{h},{f2i}}. 

(c) Let us assume that the following initial diagnoses have been obtained 
in the subsystems: 

DGN^ = {h}, DGN; = {fu}, DGN; = {fis}. 

Since all faults indicated in the diagnoses belong to the sets , they 
cannot be specified. The diagnosis that concerns the complete systems is 
equal to 

DGN = {{U},{fu},{fis}}- 

This example indicates that diagnosing in decentralised structures allows 
us to isolate multiple faults that occur in different subsystems even if the 
process is carried out on the assumption about the existence of single faults. 



19.4. Diagnostics in hierarchical structures 

19.4.1. Hierarchical description of complex diagnostic systems 

Diagnosing can be carried out not only in decentralised one-level structures 
but also in hierarchical ones. Computer units of the first level diagnose exclu- 
sively subsystems that are attributed to them. It is assumed that subsystems 
of the first level are independent, i.e., the subsets of detected faults and the 
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subsets of tests are separate. Supervisory units in the system detect and iso- 
late faults whose symptoms are observed in different subsystems of the lower 
level. Units of higher levels, knowing diagnoses formulated in the first-level 
units that are connected with them, as well as the results of tests realised in- 
dependently, formulate their diagnoses on the basis of tests that detect faults 
occuring in two or more subsystems. A method of the hierarchical descrip- 
tion of complex systems of diagnosing that is necessary for diagnostics in a 
hierarchical structure is presented below. 

As a result of system decomposition into technical centres, the follow- 
ing subsystems have been obtained that have separate subsets of faults: 

Ml 

= m,n 1, . . . ,Mi, F = U (19.20) 

n=l 

They are used for the description of particular first-level subsystems in the 
form of the following relation: 

(19.21) 

The particular elements of the above are defined as follows: 

(a) The subset of diagnostic signals that detect faults exclusively in a 
given subsystem, i.e., faults that belong to the subset F^: 

= F{sj)CFn]. (19.22) 

(b) The subset of faults detected (exclusively in a given subsystem) by 
diagnostic signals that belong to the subset 5^: 

U Pn<^Fn. (19.23) 

(c) The diagnostic relation : 

e Rfs : fk € F^, Sj e 5^}. (19.24) 

The selection of the second-level subsystems is carried out similarly as for 
the first level. The set of diagnostic signals that are not used in the first-level 
subsystems looks as follows: 

Ml 

S^ = S-[jSl (19.25) 

n=l 

It contains signals that detect faults in at least two subsystems. Faults de- 
tected by these diagnostic signals are considered at the second level: 

U F(sj). 

j-sjeS^ 



(19.26) 
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The set should be divided into separate subsets and then the description 
of second-level subsystems should be carried out similarly as according to 
the equations (19.22)-(19.24). Such a procedure is carried out in the same 
way for successive higher levels of the hierarchical structure. The number of 
subsystems at the /i-th level, denoted by is arbitrarily defined. It usually 
corresponds with the technical structure of the system. Each subsystem at a 
given h-th level is described by the following: 

0':, = {FlStRy^). (19.27) 

At the highest H-ih level, only one subsystem exists that has the following 
form: 

Or = (F,",Sf,<‘), (19.28) 

and contains: 

(a) a subset of diagnostic signals that detect faults belonging to more 
than one lower-level subsystem: 



Mh-1 

= U (19.29) 

n=l 

(b) a subset of faults detected at the highest level: 

U ( 19 . 30 ) 

jisjes^ 

(c) a diagnostic relation: 

Rfs ={{fk,Sj)eRFS- fk€Ff,sj£S^}. (19.31) 

The above hierarchical structure has the following properties: 

• subsets of diagnostic signals are separate in all subsystems: 

■5'm '5')^ = 01 m,n = l,...,Mh, m^n, g,h=l,...,H, (19.32) 



• subsets of diagnostic signals are separate, therefore the relations fault s- 
diagnostic signals are also separate: 



RpP n m, n = 1, . . . , Mh, m ^ n, 



(19.33) 



• subsets of faults detected in subsystems at the same hierarchical level 
are separate: 

F^nF^ = 0, m,n = 1, . . . ,M/i, m ^ n, h = (19.34) 
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The hierarchical diagnosing structure is described by the graph GHSD: 



GHSD = {0,L). 




(19.35) 


The following subsystems are the graph’s nodes: 






0 = {0* : n = l,...,Mh, h = l,. 




(19.36) 


and its arches 






= m^n, g,h = l,. 


..,H, g^h] 


(19.37) 



connect subsystems that have the common elements of the fault set: 

(19.38) 

Two subsystems connected by a common arch in the graph GHSD are 
called adjoined subsystems. Let us denote subsets of faults detected jointly 
in two subsystems of different levels by the following symbol: 

P^% = FinFl:, g^h. (19.39) 

A hierarchical diagnosing structure presented as an example in Fig. 19.2 has 
the following parameters: H = S, Mi = 6, M2 = 2, and M3 = 1. As shown 




Fig. 19.2. Example of a hierarchical diagnosing structure 



in Fig. 19.2, functional connections in a hierarchical diagnosing structure that 
are defined by the graph arches occur not only between units at neighbour- 
ing levels but also between units at distant levels. These connections result 
from relationships that exist between larger units of the process and contain 
subsets of first-level technical centres. They are controlled by appropriate 
supervisory units. 

The method of the hierarchical description of the diagnostic system pre- 
sented in this chapter significantly simplifies the analysis of complex processes 
and the synthesis of diagnostic systems. Diagnostic analysis can be carried 
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out independently for particular technical centres that can be considered in 
turn as a set of smaller functional units. Possessing diagnostic information 
about particular subsystems, it is easy to define connections that exist be- 
tween them, and to formulate the hierarchical description of the complete 
system. It is also important that the diagnostic system can be put into ser- 
vice stage by stage independently for particular subsystems, beginning with 
the lowest-level diagnosing units until the whole hierarchical structure is im- 
plemented for the complete system. 

Example 19.2. (The definition of a hierarchical diagnostic structure) A two- 
level diagnostic structure will be defined for the system described by the 
binary diagnostic matrix shown in Table 19.1. 

Let us assume that the following separate subsets of faults correspond with 
three nodes of a technical process for which the binary diagnostic matrix as 
shown in Table 19.1 is fulfilled: 

-^1 = {/i, • • • 5 /?}, F 2 == {fs, • • . , / 14 }, Fs = {/i 5 , . . . , / 22 }. 

According to the equation (19.22), the following subsets of diagnostic signals 
are defined for the first-level subsystems: 

= { 51 , 52 , 53 , 54 }, 52 == { 57 , 58 , 59 }, 53 == { 512 , 513, 5i4, 5i 5, 5ie}. 

On the basis of (19.23), the following subsets of faults detected in the 
first-level subsystems are defined: 

^1 — {/ 15 / 2 , /sj /4 , /sj / e}, F 2 = {/s, /g, /lo, /ii, / 12 , /i3j / 14 }, 

F 3 = {/l7, / 18 , /l9, / 20 , / 2 I, /22}- 

At the second, supervisory level, the remaining diagnostic signals (19.25) are 
used. 

Since H — 2^ only one subsystem exists at this level, therefore 

5 — — { 55 , 5g , 5io , 5ii } . 

Then, the following set is defined according to the equation (19.26): 

F\ — {/s, /?, /g, /ll, /l3, /l4, /l5, / 16 , / 18 , / 2 o}- 

In Table 19.5, which illustrates the diagnostic relation of the system, some 
diagnostic subsystems have been selected. The diagnostic relations (19.24) 
for particular first-level subsystems are shown in Tables 19.6, 19.7 and 19.8, 
while the diagnostic relation (19.31) for the supervisory subsystem defined 
on the Cartesian product of the sets (Ff x Si) is presented in Table 19.9. 

Subsets of faults detected jointly in two different-level subsystems are as 
follows: 

= {/s}, = {/9> /ii’ /i3> /h}, Fl% = {As, Ao}. 

Figure 19.3 presents the graph that corresponds with a two-level structure 
of the diagnosed system. 
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Table 19.5. Diagnostic relation of the system with selected diagnostic subsystems 
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Fig. 19.3. GHSD graph of a two-level diagnostic structure 



19.4.2. Diagnosing in hierarchical structures 

Let us consider a case of performing diagnostics in a hierarchical structure 
on using the binary diagnostic matrix on the assumption about the existence 
of single faults. Each one of the subsystems is defined by (19.27). The initial 
diagnosis (formulated in the n-th subsystem of the /i-th level) indicates a 
subset of faults whose signatures are consistent with the actual diagnostic 
signal values: 






(19.40) 
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Let us define the set of faults detected exclusively in a given subsystem: 

= u f^%. (19.41) 

If all of the faults that are indicated in the initial diagnosis DGNlh* 
belong to the set , then such a diagnosis cannot be specified nor verified 
on the basis of diagnostic signals realised in adjoined subsystems. Its form is 
final: 

DGN^* C => DGNI^ = DGN^l\ (19.42) 

When the initial diagnosis indicates faults detected in adjoined subsystems 
(for which ^ 0): 

DGN^* (t (19.43) 

then it can be specified or verified on the basis of diagnoses generated in these 
subsystems. 

The diagnosis (19.40) can be presented as the sum of two subsets of faults. 
The first one contains only faults that are detected in a given subsystem. The 
other subset contains faults detected jointly in at least two subsystems: 

BGK = {h € F!:^} U {fi e Fli - (19.44) 

The modification of the diagnosis formulated in the n-th subsystem at 
the h-th. level on the grounds of diagnosing results obtained in the m-th 
subsystem at the ^-th level is carried out if the primary diagnosis (19.39) 
indicates faults that belong to the set F^^^. If the diagnosis DGN^* does 
not indicate faults that have been indicated in the diagnosis DGN^^ then - 
assuming the existence of single faults - they should be ruled out of the final 
diagnosis: 

DGN';^* n DGNf^ = 0 DGNjl/l = {/* G (19.45) 

When the initial diagnosis DGN^ indicates faults that are shown by the 
diagnosis DGN^j;*, then one should infer that one of the faults indicated in 
both subsystems occured: 

DGN^* n DGN^ 7 ^ 0 ^ DGN^/f^ = DGN^* n DGN^ . (19.46) 

In order to obtain the final diagnosis DGN^, a modification of the initial 
diagnosis on the basis of the results of diagnoses obtained in all remaining 
subsystems for which F^^^ ^ 0, should be carried out according to the above 
formulas. 

Example 19.3. (Diagnosing in a hierarchical structure) Let us assume that 
the diagnostics of a system that has the binary diagnostic matrix shown in 
Table 19.1 is carried out in the hierarchical structure defined in Example 19.2 
(Table 19.5). The particular subsystems look as follows: 
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Subsystem 0 } : S} = {si, S2, S3, S4}, = {fi, f2, fs, U, fb, fe}- 

Table 19.6. Binary diagnostic matrix for the subsystem 0\ 
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Subsystem = {s7,S8,sg}, = {fs, fa, f 10, fn, f 12, f 13, fa}- 

Table 19.7. Binary diagnostic matrix for the subsystem Ol 
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Subsystem : 53 = {si 2 ,si 3 ,si 4 ,si 5 ,si 6 }, = {/17, /is, /19, /20, 

/2I, /22}- 



Table 19.8. Binary diagnostic matrix for the subsystem O 3 
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Subsystem Of : Sf = {sg, se, sio, sn}, Ff = {fs, fr, fa, fn, fis, fu, fi5, fie, 
/is, /2o}- 



Table 19.9. Binary diagnostic matrix for the subsystem Of 



5 /F 


/5 


fr 


/9 


hi 


/l3 


/l4 


/l5 


/16 


/18 


/20 


55 


1 


1 


1 
















56 




1 






1 












5l0 








1 






1 




1 




5ll 












1 




1 




1 




19. Diagnostics of industrial processes in decentralised structures 



777 



(a) Let us assume that the following initial diagnoses have been obtained 
in these subsystems: 

DGNl* = 0, DGNl* = {{fg}, {/i4}}, 

DGNi* = DGNf* = {{/s}, {M}. 

Since fig ^ then DGN^* = {/19} is the final diagnosis, and so is the 
diagnosis DGNl* = 0: 

DGN^ = {fig}, DGNl=t 

However, /g,/i 4 G F^' 2 ^ therefore, according to the equation (19.46) and 
assuming the existence of single faults, the final diagnoses in these subsystems 
are as follows: 

DGN^ = {fg}, DGNf = {fg}. 

The diagnosis that concerns the complete system is the sum of particular 
diagnoses: 

DGN = {{fg},{M}. 

(b) Let us assume that the following initial diagnoses have been obtained 
in these subsystems: 

DGNl* = {/s}}, DGN^* = 0 , DGN^* = 0 , DGN^* = 0 . 

Since the fault /s G has not been indicated in the diagnosis DGNf*, 
then one should infer, according to the equation (19.45), that the fault /i 
appeared. The final diagnoses are as follows: 

DGNl^ifi} and DGN^=i/}, DGN^=^H}, DGN^ = H}. 

(c) Let us assume that the following initial diagnoses have been obtained 
in these subsystems: 

DGNl* = {/4>, DGN^* = 0, DGN^* = {{M, {/ 21 }}, 

DGNl* = 

Since f\ ^ , the initial diagnosis in this subsystem is at the same time 

the final diagnosis. The fault /i8 belongs to the set ^1^3 . Inference on the 
assumption about single faults leads to the conclusion that this fault oc- 
cured because it was indicated in the initial diagnoses of both subsystems. 
Therefore, 

DGN\ = {h}. DGN\=%, DGNi = {hs}, DGN^ ^ {hs}. 

The diagnosis that concerns the complete system is the sum of particular 
diagnoses. It equals 



iPGiV={{/4},{/l8}}. 
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19.5. Summary 

The rules of diagnostic inference in decentralised one-level and hierarchical 
structures are the same. It is sufficient to compare the equations (19.18) 
and (19.45), as well as (19.19) and (19.46). In order to obtain the final diag- 
nosis in a given system, one should use partial diagnoses in all subsystems 
that are connected with the system. Two subsystems are connected (adjoined) 
if the intersection of the subsets of faults that are detected by them is not an 
empty set. 

Examples 19.1 and 19.2 proved that diagnosing in decentralised structures 
allows us to isolate multiple faults even if it is carried out on the assumption 
about the existence of single faults. Difficulties with the correct formulation 
of diagnoses appear only when multiple faults exist in one of the subsystems, 
or in the set of faults detected simultaneously in two subsystems. In order 
to formulate correct diagnoses in such cases, rules of inference that take into 
account system states with multiple faults should be applied (Koscielny, 1991; 
1998; 2001). 

The main advantages of decentralised diagnosing include: 

• the adaptation of the diagnostic system structure to a decentralised 
structure of the control of industrial processes, 

• the division of diagnostic functions into many parallel-operating 
computers, 

• lowering the number of faults recognised by particular computer units, 
which results in significant lowering of calculation costs and makes the as- 
sumption about the existence of single faults in subsystems more justifiable, 

• limiting fault effects of particular diagnosing units in comparison with 
the centralised structure, 

• ensuring a better adaptation of diagnostic information to the needs of 
the process operators, 

• ensuring the possibility for the diagnosing system to be put into service 
stage by stage, in succession for particular subsystems. 
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Chapter 20 



DETECTION AND ISOLATION OF 
MANOEUVRES IN ADAPTIVE TRACKING 
FILTERING BASED ON MULTIPLE MODEL 

SWITCHING 



Zdzislaw KOWALCZUK*, Mirostaw SANKOWSKI** 



20.1. Introduction 

Tracking filters are the principal part of radar data processing systems. In 
fact, the tracking filter is a state estimator (Anderson and Moore, 1979) of 
the object (target) being tracked. Its task is to process radar measurements 
of motion parameters of such a target in order to reduce measurement errors 
by means of time averaging, to estimate the object’s velocity and acceleration 
and to predict its future positions (Blackman and Popoli, 1999). 

The problem of designing tracking filters and systems in general have 
been investigated since World War II. The results of the research were sum- 
marised in several monographs, e.g., (Bar-Shalom and Fortmann, 1988; Bar- 
Shalom and Rong Li, 1995; Bar-Shalom et a/., 1998; Blackman, 1986; Black- 
man and Popoli, 1999; Farina and Studer, 1985). Since the end of the 1960s, 
tracking filters have been principally supported by Kalman filtering theory 
(Anderson and Moore, 1979), which also constitutes the design basis for the 
approach presented in this work. 

The basic problem of tracking manoeuvring moving objects (e.g. aircrafts, 
ships) is the unpredictability of object manoeuvres with respect to the time 
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of their occurrence, their duration and the type of their trajectory. Therefore 
a dynamic model of the manoeuvring target trajectory is in general non- 
stationary, regarding both its structure and its parameters. 

Different solutions to the problem of tracking manoeuvring targets have 
been considered in the literature on the subject. Singer (1970) proposed a 
Kalman Filter (KF) for the object acceleration modelled as a first-order auto- 
regressive process. It has been observed that such a filter allows tracking the 
targets during manoeuvres, but its accuracy for non-manoeuvring fractions 
of the target trajectory is worse than the accuracy of a KF based on a model 
of a uniform motion. It is thus expected that better results can be obtained 
by means of adaptive state estimators. 

From amongst many methods of adaptive tracking, Multiple-Model (MM) 
methods have become quite popular during the last 20 years. An MM filter 
is based on a set of kinematic models, which describe certain types of tra- 
jectories and constitute the basis for a set of state estimators. For a suitable 
description of the most important principles of MM estimation, a simple 
classification of the existing approaches should be made. Principal rules of 
a given MM estimation method can be easily recognised by answering two 
basic questions: (1) How do partial estimators work? (2) How is the final 
estimate computed? The first question leads to a distinction between the 
competitive and cooperative estimation schemes and to the issue of exchang- 
ing information between component estimators. The other question refers to 
the problem of how to combine several estimates obtained from partial filters 
into one estimate. This problem is usually solved by selecting the estimate of 
the most likely filter (an exclusive approach) or by appropriately combining 
(mixing) all component estimates (a non-exclusive approach). In view of the 
above classification, the following four basic approaches to MM estimation 
problems can be recognised: cooperative, exclusive (Bar-Shalom and Birmi- 
wal, 1982; Roecker and McGillem, 1989), cooperative, non- exclusive (Blom 
and Bar-Shalom, 1988), competitive, exclusive (Easthope and Keys, 1994), 
competitive, non-exclusive (Magill, 1965). 

In this chapter we shall consider certain diagnostic aspects (Kerr, 1989) 
of the adaptive MM tracking of manoeuvring targets. We shall also assume 
that a regular movement of objects fits the model of a straight-line uniform 
motion (the Constant Velocity model, CV), while manoeuvres appear rarely. 
This assumption simply means that the objects considered are not supposed 
to manoeuvre permanently. By constructing an appropriate set of classes 
of physical models of manoeuvres, including the model of a straight-line uni- 
formly variable motion (the Constant Acceleration model, CA) and the model 
of a circular motion with a constant angular velocity (the Coordinated Turn 
motion, CT), it is possible to describe how different manoeuvres influence 
the state estimation process performed by a Kalman filter, based on a non- 
manoeuvre model of motion. Such an influence can be directly observed in a 
bias of the innovation process of the filter. 
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An Input Estimation (IE) algorithm for the estimation of an unknown 
control input signal (Chan et al, 1979) makes it possible to estimate the 
value of constant longitudinal acceleration/deceleration influencing the mov- 
ing object and to correct the state estimator (KF) based on the model of a 
straight-line uniform motion or to initiate another state estimator which is 
based on a straight-line uniformly variable motion (Park et a/., 1995). 

An IE-based method of estimating the constant centripetal accelera- 
tion in a model of a circular manoeuvre was proposed by Kowalczuk et al. 
(1997). This methodology was later on developed (Kowalczuk and Sankowski, 
2000a; Sankowski and Kowalczuk, 2001) and extended to a computer algo- 
rithm, which allows detecting the occurrence of a manoeuvre, estimating its 
onset time moment, classifying its trajectory within the analysed classes of 
manoeuvre trajectories (isolation), and estimating trajectory parameters. 

This method of the Detection-Isolation-Estimation (DIE) of manoeuvres 
allows designing an adaptive multiple-model state estimator of the coopera- 
tive, exclusive type, in which only one of the models considered is assumed 
to be true for a given time interval (Kowalczuk and Sankowski, 2001). The 
scheme of such an estimation system is shown in Fig. 20.1. It consists of 
DIE algorithms, a Manoeuvre-End Detector (MED) and three KFs based on 
CV, CA and CT models. This approach belongs to the group of approaches 
commonly referred to as hard- decision or hard- switching methods. 




Fig. 20.1. Exclusive multiple-model filter 

The assumed model of the radar measurement process is defined in Sec- 
tion 20.2. Section 20.3 is devoted to the modelling of object movement tra- 
jectories. For the analysed class of flying objects (civil aircrafts), a move- 
ment trajectory is described as a superposition of uniform motions (CV) 
and straight-line uniformly variable motions (CA) or coordinated turns (CT) 
interpreted as sporadic manoeuvres. The method of input estimation is de- 
scribed in Section 20.5. This method is based on the analysis of the innovation 
process of the KF presuming the CV model is given. The analysis is performed 
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for the detection of a modelling mismatch, which occurs when the real tar- 
get trajectory differs from the assumed CV model. Section 20.6 presents an 
original method which allows detecting the occurrence of a manoeuvre, esti- 
mating its onset time moment, isolating the type of its trajectory, and the 
identifying the trajectory parameters. This method utilises the IE algorithm, 
certain physical (analytical) models of movement trajectories in the Carte- 
sian reference system, and an iterative generalised least-squares method for 
the identification of the normal acceleration in the non-linear model of the 
coordinated turn. An evaluation of the proposed approach is given in Sec- 
tion 20.7, taking into consideration the properties of the movement models, 
the reliability of the detection and isolation procedures as well as the accu- 
racy of parametric identification. The effectiveness of the method is assessed 
by means of computer simulations and also by an analytical analysis. The 
last section contains a summary and some conclusions. 



20.2. Model of the measurement process 

To facilitate the forthcoming discussion two basic definitions of the plot and 
the track will be introduced. 

Definition 20.1 A plot is a radar extractor output (measurement with all 
estimated coordinates) that exceeds a certain detection threshold. 

Note that, in general, such a plot can, afterwards, be interpreted as a true 
target detection or a false alarm. In this work, however, the problem of false 
alarms is not considered and all the processed plots originate in the target 
being tracked. Therefore, in the following text the notions of the plot and the 
measurement are used as synonyms. 

Definition 20.2 A track is a set of plots over a time interval assumed to have 
a common target origin in which the target state is estimated (Bar-Shalom, 
2001 ). 

There is a fundamental difference between the track and the trajectory. 
The latter refers to a real motion of the target and can be described as the 
evolution of the target state (motion parameters) according to a given dy- 
namic model. The track is a representation of the trajectory in the system 
via the projection of the target motion parameters into a sensor space (mea- 
surement process). The amount of information about the target trajectory 
preserved in a corresponding track is bounded due to a limited number of 
associated plots and to measurement errors. Additional uncertainty is related 
to the track when problems of multiple targets and/or false alarms are con- 
sidered. In such a case, due to a limited capability of sensors, plots of different 
origins can be misinterpreted as one track (Kowalczuk et al, 2002). 
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In this work it is assumed that all plots (measurements) constituting the 
track originate in a common target. As a result, the problem of radar tracking 
is reduced to that of state estimation. 



20.2.1. Sampling period 

The radar measurement process has a discrete-time nature. A time interval 
between successive measurements originating in the same object is defined as 

Tn — tfi^i — tn» (20.1) 

The time interval Tn can be interpreted as a temporal sampling period which 
can vary in time and depends, in particular, on the radar Antenna Revolution 
Time (ART) and on the speed and course of the object. Moreover, due to a 
limited probability of detection Pd < 1, a sudden fadeout of the measurement 
signal is possible which, when it does occur, increases the sampling period 
Tn. 

In order to simplify the mathematical notation, in the following the time 
instants tn will be marked by their indices n. In this case the set of pairs 
{i,Ti}, i = 1, . . . ,n, constitutes a complete description of the time instant 
tn+i , which can be calculated according to 

n 

^ n +1 = ^1 + ( 20 . 2 ) 

i=l 

for n > 0, where t\ denotes the first discrete-time instant referring to the 
first measurement associated with a given track. 



20.2.2. Radar measurements 



It is assumed that a radar provides unbiased measurements of the object 
position (plots) in polar coordinates (slant range Q[n] and azimuth o;[n], 
where denotes a measurement variable) in discrete-time instants n. 

The polar Range- Azimuth (RA) coordinate system originates in the radar 
location with the line of sight a = 0 pointing to the north. The measurements 
are assumed to be disturbed by additive stationary independent Gaussian- 
distributed random processes (noises) and ra with zero expectations 
and variances and cr^, respectively. Such measurements constitute the 
following 2-dimensional (2D) measurement vector: 



Q[n] 




g[n] 


+ 


1 


— 


a[n] 




a[n] 




ra[n]\ 



characterised by a diagonal covariance matrix of measurement noise: 

al 0 
,0 < ■ 



(20.3) 






(20.4) 
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20.2.3. Measurement equation 

The use of the sensor coordinate frame as the basis for the modelling and 
design of KFs leads to a convenient linear measurement equation. In the case 
of the modelling of aircraft trajectories for radar tracking systems, this ap- 
proach is chosen relatively rarely because of the non-linearity of the resulting 
models in the polar RA frame. The common method for such systems is to 
use a local Cartesian East-North (EN) coordinate system for modelling sys- 
tem dynamics and a first-order approximation of measurement non-linearity. 
The EN coordinate system has its origin in the radar location and is defined 
by two axes: an x axis pointing to the East (E) and a y axis pointing to the 
North (N). 

The above principle can be implemented in the converted measurement 
method (Blackman and Popoli, 1999), which is based on a transformation of 
the measured coordinates from the RA coordinate system to the EN frame: 



y[n] = h{z[n]) = 



g[n] sin(o;[n]) 
g[n] cos (o:[n]) 



(20.5) 



The resulting pseudo-measurement (converted measurement) vector 



y[n] = 



x[n] 

y[n] 



( 20 . 6 ) 



of the dimension dy — 2 will henceforth be referred to as the measurement 
vector described as 



y[n] = p[n] + e[n]. 



(20.7) 



where 



p[n] = 



x[n] 

y[n]_ ’ 



e[n] = 



ex[n] 

ey[n] 



( 20 . 8 ) 



The quantities x[n] and y[n], constituting the true position vector p[n], 
describe a real object position, while ex[n] and ey[n] can be interpreted as 
projections of the measurement errors r^[n] and r^ln] into the Cartesian 
frame. Since the transformation (20.5) is non-linear, the random variables 
ex[n] and ey[n] are not Gaussian in general. Nevertheless, it is very conve- 
nient to assume the Gaussian distribution of the measurement errors in the 
Cartesian coordinate system. With this assumption the covariance matrix 
E[n] of the measurement errors in the Cartesian frame can be approximated 
by the linearization of (20.5): 



E[n] — cov [e[n],e[n]] = H[n]RH^[n], (20.9) 



where R is defined in (20.4), while H[n] is the Jacobian matrix of the 
transformation h{-) from the polar RA coordinates to the Cartesian EN 
ones, evaluated for the measurements z[n] in the RA frame. 
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20.3. Modelling the target movement trajectory 

In this section discrete-time dynamic models of the assumed possible aircraft 
trajectories are derived. As is shown in Fig. 20.2, the coordinates x{t) and 
y{t) denote the target position at time t in the local Cartesian EN reference 
system. 




Fig. 20.2. Basic parameters of the 2D motion model in the EN reference frame 

The quantities v{t) and 'ip{t) denote instantaneous values of the tan- 
gential velocity and the course of the target, respectively. Thus the pair 
{x{t),y{t)} denotes a 2D position of the target, while the pair {v{t),'ip{t)} 
will be referred to as the polar velocity. 



20.3.1. Kinematic model of the planar curvilinear motion 

In this paragraph a model of the planar curvilinear motion (CLM) in the 
Cartesian EN coordinate system is under consideration. 

The speed of the target can be described directly in the Cartesian coor- 
dinates and related to the polar velocity: 

x{t) = v{t) sin (V^(t)) , (20.10) 

y{t) = v{t) cos (V’(i)), (20.11) 



where 

v{t) = + y^{t). (20.12) 

The equations (20.10)-(20.11) can be rewritten in a vectorised form: 

v{t) = v{t)ut{t), (20.13) 



if 



u(t) = 



x{t) 

yi*). 



means a Cartesian velocity vector, and 



(20.14) 



Ut(t) = 



I sm (V’(t)) 

[cos (V’(f)) J 

denotes a unit vector (versor) tangential to the target trajectory. 



(20.15) 
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The differentiation of the equations (20.10)-(20.11) results in the following 
second-order differential equations: 

x{t) = '^(0 sin -h v{t)'ip{t) cos {'ip{t)), (20.16) 

y{t) = ^|u(i)cos (V’(t))| = v{t)cos - v{t)ip{t)sm (t/»(i)), (20.17) 

describing a curvilinear target motion. This non-uniform motion results from 
an unbalanced force (2nd Newton’s law), which in the model (20.16)-(20.17) 
is decomposed into two perpendicular forces acting on the target body along 
(tangential force) and across (normal force) the trajectory. The result of this 
influence can be described by the tangential (longitudinal) at{t) and the 
normal a„(^) acceleration defined as 

at(t) == v{t), (20.18) 

Un(t) = v{t)'ip{t) = v{t)u{t), (20.19) 

where uj{t) is the angular velocity or the turn rate. The above accelerations 
are shown in Fig. 20.3. 




Fig. 20.3. Geometry of curvilinear motions 



Consequently, the equations (20.16)-(20.17) can be written in a vectorised 
form: 

a{t) = at{t)ut{t) + an{t)un{t), (20.20) 



where 



a{t) = 



x{t) 

y{t). 



( 20 . 21 ) 



is an acceleration vector, while 



cos (i/^(t)) 
- sin 



( 20 . 22 ) 



is a versor normal to the target trajectory. 
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20.3.2. Basic assumptions 

Based on the characteristics of civil aircrafts (EUROCONTROL, 1993), it is 
assumed that the moving object trajectory consists of straight-line motions 
(travel paths) and some manoeuvres. Precisely, we assume that the object 
moves according to the model of the straight-line uniform motion and then, 
at the time instant to , it starts to manoeuvre according to a certain physical 
model. This manoeuvre lasts until the time moment 

In order to classify the models of the object’s motion, three types of 
trajectories are associated with the set of hypotheses {Hj}, j G {0,1,2} 
(Kowalczuk and Sankowski, 2001; Sankowski and Kowalczuk, 2001), includ- 
ing: 

• the hypothesis HO (a uniform motion at a constant velocity: CV), mean- 
ing that the object travels along a straight line; this model is a special case 
of the CLM model for at{t) = 0 and an(^) == 0, 

• the hypothesis HI (a uniform speed change with a constant acceleration: 

CA), meaning that the object moves along a straight line with a constant 
tangential acceleration, for < t < this model is a special case of the 
CLM model for at{i) — — const, and a^(t) = 0, 

• the hypothesis H2 (a standard or coordinated turn: CT), meaning that 
the object manoeuvres on a circular path with a constant normal acceleration, 
for to < t < to; this model is a special case of the CLM model for On(t) = 
Un == const, and at(t) = 0. 

Table 20.1 summarises the behaviour of the basic parameters of the object 
motion (e.g. position, velocity, course) for trajectories related to the analysed 
hypotheses (models). The additional hypothesis HO’ denotes a non-moving 
object and is included in Table 20.1 for reference. 



Table 20.1. Parameters of the object’s trajectories 
within the particular hypotheses 



Parameter 


HO’ 


HO 


HI 


H2 


position p(t) 


constant 


variable 


variable 


variable 


tangential velocity u(t) 


zero 


constant 


variable 


constant 


course V’(t) 


undefined 


constant 


constant 


variable 


tangential acceleration at(t) 


zero 


zero 


constant 


zero 


normal acceleration fln(t) 


zero 


zero 


zero 


constant 



In Fig. 20.4 a scheme of the time reference framework is defined for the 
manoeuvring aircraft trajectory (upper part) and the measurement process 
(lower part). The two discrete-time instants no and are associated with 
the manoeuvre onset time to and the manoeuvre end time to , respectively. In 
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start of the 

uniform-motion 

trajectory 






t\ ti 



manoeuvre 

start 







manoeuvre 

end 



to 







t\ tc 



first track manoeuvre- manoeuvre- last track 

plot initiation -start -end plot cancellation 

detection detection 



Fig. 20.4. Time reference framework of the aircraft 
trajectory and its corresponding track 



fact, there is no guarantee that any of the discrete instants (samples) match 
exactly either to or ^q. Therefore, in the following the symbols no and Uq 
will be interpreted as the discrete-time instants nearest to their respective 
moments to and to- 

Figure 20.5 explains the relationship between the object’s trajectories 
defined within the assumed set of motion models. In this figure the numbers 
0/1, 0/2, 1/0, 2/0 indicate the direction of transitions between the respective 
hypotheses. 




HI 



»0,0/l 




HO 



^ 0 , 0/2 , 



1 0,2/0 




H2 



Fig. 20.5. Graph of possible transitions between the trajectories (hypotheses) 



There is an additional subset of analytical manoeuvre models that can 
be isolated within the motion-model set given above. This subclass is com- 
posed of the hypotheses HI and H2, which can be associated with a universal 
hypothesis of the manoeuvre: HM— {H1,H2}. 

20.3.3. Model of the uniform motion 

The continuous-time model of the uniform motion is based on the hypothesis 
of the straight-line motion at a Constant Velocity (CV) in the Cartesian 
coordinates: 

a(^) = 02x1, (20.23) 

for t < to or t > tQ. Moreover, p{tjj) = v{t\j) = uu for t = tjj 
denoting the time moment referring to the start of the uniform-motion part 
of the target trajectory. 

Above and in the following, O^xc and Irxc denote the null and identity 
matrices, respectively. The subscript (r x c) shows the size of the matrix 
having r rows and c columns. 
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The equation (20.23) can be rewritten in the state-space form: 



x{t) = Ftx(t), 



where 



x{t) = 



p{t) 



p{t) = 



x{t) 

y{t). ’ 



Ft 



©2x2 I2x2 
02x2 02x2 



(20.24) 



(20.25) 



In the above, x{t) is a state vector and Ft is a continuous-time state tran- 
sit ion matrix. 

The discretization of the model (20.24) at the sampling period (time in- 
terval) Tn leads to a fundamental transition matrix: 

F[n + 1, n] = F[tn+i,tn] = exp (T„Ft) , (20.26) 

along with a second-order discrete-time linear state-space model, which can 
be enhanced with an additional forcing input w[n] and represented by the 
following observation equation (see Section 20.2): 



x[n + 1] == F[n -h l,n]x[n] H- B[n -h l,n]w[n], 
y[n] = Cx[n] -h e[n], 



where 



v[n] = 



x[n] = 

Vx[n\ 

yy[n\ 



p[n] 

v\ri\ 



F[n -f- l,n] = 



I2x2 7nl2x2 
©2x2 I2x2 



, B[n -1- l,n] = 



^2/212x2 

?nl2x2 



, w[n] = 



■Wx[n] 

\wy[n]_ 



C= 12x2 02x2 , W = d:\&g{al,,al,],x[ti] = x^, 



(20.27) 

(20.28) 



(20.29) 

(20.30) 

(20.31) 



for n < no and n > Uq. In the above, discrete variables x[n] and y[n] are 
the position coordinates, Vx[n] and Vy[n] describe the velocity coordinates 
of the moving object, x[n] is the state vector, y[n] is the measurement 
vector, {w[n]} is a stationary Gaussian white-noise vector sequence with 
a covariance matrix W, {e[n]} is a non-stationary Gaussian white-noise 
sequence with a covariance matrix E[n]. It is also assumed that both the 
sequences {iy[n]}, {e[n]} and the initial Gaussian-distributed state x[ti] are 
mutually uncorrelated. The discrete-time instant ti denotes the moment of 
the initiation of the resulting state estimator (described in Subsection 20.4.2). 
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20.3.4. Model of the uniform speed change 

The manoeuvre model corresponding to the hypothesis HI is based on a 
physical law of the straight-line uniformly variable motion (CA model) in 
Cartesian coordinates: 

a{t) = atUt{t), (20.32) 

for to < t < tQ, p{to) = Po: v{to) = vq, where is the course of the 
object and at is a constant tangential acceleration as shown in Fig. 20.6. 




Fig. 20.6. Geometry of a linear manoeuvre 



By taking into account (20.21), the equation (20.32) can be rewritten in 
the following state-space form: 



x(t) — -j- (i^o)^t 5 



where 



Bt = 



I2x2 

02x2 



(20.33) 



(20.34) 



with the remaining elements of the model defined by the equations (20.25). 
The discretization of the model (20.33) according to 



t+Tn 



^1 [^n+1 7 ^o] — / exp ((t + Tn - r)Ft)BtUt{to)d7 



(20.35) 



results in the following discrete-time linear state-space model with an exoge- 
nous input ai = at: 

xl[n + 1] = F[n -h l^n]x\[n] -j- hi[n -h l,n]ai -f jB[n + l,n]tt;[n], (20.36) 

y[n] = Cxl[n] -f e[n], (20.37) 

for uq < n < Uq, where 

6i[n -I- l,n] = bi[n + l,n;no] = B[n + l,n]ui[no], (20.38) 
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with the discrete-time versor tangential to the target trajectory: 



Ui[n] = Ut{tn) = 



sin ('0[n]) 
cos (V^[n]) 



(20.39) 



The quantity xl[n] is a state vector of the form identical to x[n], while the 
column vector 6i[n 4- l,n] models the way in which the constant tangential 
acceleration ai influences the object’s motion. 

Due to the form of the controlled dynamic system, the model (20.36)- 
(20.37) will be referred to as the Controlled Constant Acceleration (CCA) 
model. 



20.3.5. Model of the standard turn 

Discrete-time models of coordinated turns can be derived from the well-known 
equation of a circle (Lanka, 1984; Roecker and McGillem, 1989). Such an 
approach, however, results in models related to an unknown centre of the 
manoeuvre circle (Kowalczuk and Sankowski, 2000b). 

Thus another CT model is usually used that can be derived directly from 
the CLM model (20.20): 

a{t) = anUn{t), (20.40) 

where 

'Ip{t) = 'ip{to) {t- to)uj, (20.41) 

with ^0 < ^ < ^0 5 p{to) = Po 5 ^(^o) = the constant turn rate uj and 
the constant tangential speed v. Figure 20.7 shows the geometrical model of 
such a circular manoeuvre. 

The turn rate is 

UJ=—, (20.42) 

V 

with a; > 0 for the right turn manoeuvres and a; < 0 for the left ones. 




Fig. 20.7. Geometry of circular manoeuvres 
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The model (20.40) is non-linear because the vector Un{t) depends on 
and v{t) (Best and Norton, 1997). The equation (20.40) can be rewritten into 
a state space form with linear and non-linear parts of the model separated: 

x{t) = Ftx{t) H- BtUn{t)an. (20.43) 

Because the required state-estimation algorithm is based on the discrete 
KF theory, it is necessary to find a discrete-time variant of the model (20.43) 
of the form identical to the model (20.36)-(20.37), namely: 

X 2 [n + 1] == F[n -h l,n]x 2 [n] -h b 2 [n l,n]a 2 + B[n + l,n]tn[n], (20.44) 

y[n] = Cx 2 [n] -f e[n], (20.45) 

for no < n < Uq. The quantity is the state vector of the form identical 

to x[n], while the column vector & 2 [n + 1, n; •] is a suitable conversion vector 
that describes the infiuence of a constant normal acceleration a 2 = Un on 
the object motion described in the Cartesian reference framework (Sankowski 
and Kowalczuk, 2001). 

Due to the form of the controlled dynamic system, the model (20.44)- 
(20.45) will be referred to as the Controlled Coordinated Turn (CCT) model. 
In the following paragraphs two approaches to the determining of the vector 
b 2 [^ + 1, •] are presented. 

Approach 1: Exact discrete model 

The CCTE model can be obtained via a suitable discretization of the 
continuous-time model (20.43): 

^2 [^n+l5 ^n5 ^Oj ®2 [tn],a2]= J exp{(t + Tn-T)Ft)BtUn{r)dT. (20.46) 

t 

The above operation results in the discrete-time state-space model (20.44) 
with 



b2[n+l,n]no,xl[no\,a2] = — 



, (20.47) 



-TnUi[n] - l/a;(u 2 [n-f-l] - U 2 [n]) 
ui[n-}-l] — ui[n] 

where the discrete-time versor normal to the target trajectory is defined as 

cos {'ip[n]) 

[- sin(V’[n])_ 

n— 1 

■0[n] =tp[no]+uj'^Ti. 

i=0 



^2(^1 — Wn(^n) — 



(20.48) 



(20.49) 
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The described approach results in a non-linear discrete model, which pre- 
serves an analytical description of the circular trajectory in intervals between 
the sampling instants and precisely describes the positional and velocity pa- 
rameters in these instants. 

Approach 2: Approximate discrete model 

An approximate conversion vector & 2 [n + l,n;-] (of a direct discrete-time 
model) can be obtained by using the known expression (20.40), describing the 
values of the acceleration vector a{t) in the discrete-time instants t = tn- 
Assuming that the value of the vector a{t) is constant between the sampling 
moments n, an approximate CCTA model of the influence of the normal 
acceleration has the following form: 

b2[n + l,n;no,iC2[no],a2] = B[n l,n]u2[n]. (20.50) 

The models (20.47) and (20.50) present two alternative methods of mod- 
elling the normal influence of the acceleration. 



20.4. State estimation during uniform motions 

The system (20.27)-(20.28) constitutes the design basis for tracking non- 
manoeuvring vessels by the Kalman Alter (Anderson and Moore, 1979). 

20.4.1. Base Kalman filter 

Based on the dynamic model (20.27)-(20.28) of the uniform motion and ac- 
companying assumptions, the discrete-time KF (base KF) can be shown to 



have the following form: 

x[n + l|n + 1] = x[n -h l|n] + L[n -f- l]i/[n H- 1], (20.51) 

x[n l|n] = F[n -h l,n]x[n\n], (20.52) 

i/[n -h 1] = y[n + 1] - Cx[n -h l|n], (20.53) 

P[n + l|n + 1] = P[n 4- l|n] — L[n 4- l]CP[n 4- l|n], (20.54) 

P[n 4- l|n] = F[n 4- l,n]P[n\n]F^[n 4- l,n] 

-h jB[n 4- l,n]WB^[n -f l,n], (20.55) 

u?[n 4- 1] = CP[n 4- l|n]C^ 4- E[n 4- 1], (20.56) 

L[n 4- 1] = P[n 4- l\n]C^(jj~^[n 4- 1]. (20.57) 
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The quantities x[n|n] and P[n\n] given above denote a filtered state esti- 
mate and the covariance matrix of its estimation errors, respectively, while 
x[n\n — l] and P[n|n — 1] denote a predicted state estimate and the respec- 
tive covariance matrix of its prediction errors. The remaining parameters of 
the filter are: a KF gain matrix X[n], an innovation process and its 

covariance matrix o;[n]. 

It is clear that the above KF used for tracking manoeuvring objects will 
yield biased estimates. On the other hand, the innovation process can be used 
for detecting non-uniform movements. 



20.4.2. Initiation of the tracking filter 

Coordinates of the estimate x^\2] of the state vector are initiated using 
measurements from two successive radar scans. By assuming that the object 
moves along a straight line, the initial state vector is 



x[2\2] = 



p[2|2] 

,t)[2i2]. ’ 



(20.58) 



where the initial position and velocity estimates are calculated from the first 
two measurements as 



p[2|2] = p[2], (20.59) 

*[2|2] = ^{p[2] -p[l]). (20.60) 

In order to complete the initiation procedure, we have to derive the co- 
variance matrix P[2|2] corresponding to the vector ^[2|2]. 

Using the instrumental model of converted measurements (20.7) leads to 
measurements for the time instants n = 1 and n = 2 described as follows: 



p[l]=p[l] + e[l], (20.61) 

p[2] = p[2] + e[2]. (20.62) 

Consequently, the initial values of the state vector obtain the form 

p[2|2]=p[2] + e[2], (20.63) 

«[2|2] = ^ (p[2] - p[l]) + ^ (e[2] - e[l]) . (20.64) 



Each initial value of (20.63)-(20.64) is expressed as a superposition of a 
true value and its estimation error. Covariances of these errors can be derived 
as follows: 



cov [e[2],e[2]] = E [e[2]e"^[2]] = P[2], 



(20.65) 
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cov e[2],^(e[2] -e[l]) 

= E[e[2];l(e[2]-e[l])'" 

= E [;^e[2]e^[2] - ^e[2]e'^[l]] = ^E[2], (20.66) 

,-Ll -Ll J -Ll 

cov[^(e[2]-e[l]),^(e[2]-e[l]) 

= E[^(e[2]-e[l])(e[2]-e[l])'" 

= E [^e[2]e^[2] - ^e[2]e'^[l] - ^e[l]e'^[2] + ^e[l]e^[l] 

= ^(E[1] + E[2]), (20.67) 

where E[n] is the covariance matrix of the converted measurement y[n] 
computed according to (20.9). 

Thus, in the analysed two-dimensional case, the resulting initial covari- 
ance matrix of the filtering error has the form 

T^E[2] : TiE[2] 

Pim = ^ ( 20 . 68 ) 

|_TiE'^[2] : E[l] + E[2] 

20.5. Identification of the control signal 

A state-estimation algorithm can be easily modified (adapted) so as to re- 
flect the variable target structure and parameters. It is, however, necessary 
to detect the occurrence of a manoeuvre, isolate its type (structural identi- 
fication) and estimate the parameters of the manoeuvre model (parametric 
identification) . 



20.5.1. Input estimation method 

The principal approach to the estimation of an unknown control signal (IE 
method) can be found in (Chan et al, 1979). In the derivation of the corre- 
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spending algorithm, two types of Kalman filters are employed: 

• a base (HO) filter of (20.51)-(20.57) for the model (20.27)-(20.28), which 
assumes that the control signal aj[n] = 0, 

• a hypothetical filter for the following general manoeuvre model: 

Xj[n + 1] = F[n + l,n]Xj[n] -h bj[n + l,n]aj + B[n + l,n]w[n], (20.69) 

which, in particular, takes the form of (20.36)-(20.37) or (20.44)-(20.45), 
corresponding to the hypotheses Hj, j G {1,2}, with the presumption that 
the control signal aj [n] is non-zero and its value is known for a certain period 
of time. 

It is thus assumed that a non-zero acceleration starts exerting its influence 
on the object at the time instant Uq = n — N. This influence can be observed 
within a sliding window of length N at the moments = n — N + i, 
i = ... namely, from the time instant till = n. 



observation window 



time 






^N-2 



^N-\ 






Fig. 20.8. Sliding iV-length window of observation 



Let x[riij^i \rii] be an estimate yielded by the base (HO) filter in the interval 
during which the hypothesis Hj, j G (1,2), is true: 

x[n^+i\n^] = ^[nf+i,n^]x[nf\nfLi] + F[n^i,nf]L[nf]y[nf], (20.70) 

with 

nf ](l4x4 - r[nf]c), (20.71) 

where i = 0,...,iV — 1, while L[nf^] denotes the gain matrix of the KF. 
Moreover, let be an estimate obtained from the hypothetical 

filter under the same circumstances. 

By assuming that cij[nf^] = 0, with nf^ < the initial condition of 
the recursion equation (20.70) is 

Xj[riQ \n^i] = x[riQ \n^i\. (20.72) 

It can be efficiently shown that if the acceleration does not change its 
value {aj[nf^] = aj) during the manoeuvre, the innovation process of the 
base filter is 



(20.73) 
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where 



V’i ,no] = Crtij [nf , n^] , 



i—1 






i-l-2 






1=0 L m=0 






(20.74) 

(20.75) 



for i = 1, . . . ,N. The vector describes the result of a mismatch 

between the base filter and the hypothetical filter based on a manoeuvre 
model that is represented by a bias in the innovation process observed within 
the time interval from to njy. With the model (20.73) of the innovation 
process of the base filter, we can evaluate the value of the acceleration aj. 

As the acceleration exercises its influence on the object for samples taken 
beginning with the time its effect can be observed in the innovation 
process (20.73) within the sliding window, through njy, where is 
the actual instant. All such N (successive) equations can be shown in the 
form of an aggregate matrix equation: 



V N[n] = (20.76) 

where 











V[nf]‘ 




1 

••• 

1 


5 

NdyXl 


VAr[n] = 





vUM = 

and V^ j^[n] is a vector of independent random Gaussian-distributed vari- 
ables characterised by the zero mean and the covariance matrices 

a;[nf^], i — 1, . . . , W. Thus the covariance matrix of the vector Vj jsf[n] is 






y*AnNU 



(20.77) 



ra;[nf] 



ftN[n] = 



L ^2x2 



02x2 






NdyXNdy 



(20.78) 



The estimate of the acceleration aj can be computed by means 

of the Generalised Least Squares (GLS) approach: 






hj,N[n] 

9j,N[n] ’ 



(20.79) 
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where j G {1,2}, and 



gj,N[n] = (20.80) 

= 'i'];jv[n]n)^i[n]V;v[n]. (20.81) 



20.5.2. Basic properties of the input estimation method 

It can be easily shown that the estimator (20.79) is characterised by the 
properties given below. 

An estimation error defined by 

Oj.JvH - 0-i = (20.82) 

9j,NM 

with the model (20.76) can be estimated as 

- aj = — (20.83) 

9j,N[n\ 

Since the expected value of the estimation error is 

E[oj,ivM - aj] = — = 0, (20.84) 

9j,N[n\ 



the estimator (20.79) is unbiased. 

Using the basic properties of symmetric non-singular matrices, the covari- 
ance of the estimation error 



^|,ivN = cov[(a^-,ivN - aj)] = El (a^-,iv[n] - aj) 



can be estimated as 






1 

ft,iV W ’ 



(20.85) 

( 20 . 86 ) 



20.6. Detection and isolation of manoeuvres 

The key concept of the proposed state-estimation algorithm takes the form 
of three identification steps: 

1. Detection: The analysis of the innovation process of the KF based on 
the CV model is performed on-line for the detection of the occurence of a 
manoeuvre. 

2. Isolation: After the detection of a manoeuvre, the input estimation 
algorithm (Chan et al, 1979) based on the CV, CCA and CCT models is 
used for the isolation of the manoeuvre class (Sankowski and Kowalczuk, 
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2001). Thus in the analysed case, the isolation procedure is equivalent to the 
structural identification of the manoeuvre model. 

3. Estimation: With the aid of both the hints that result from the detection 
and isolation procedures and the informational content of the IE algorithm, 
the estimation of the basic parameters of the manoeuvre trajectory is per- 
formed, which means that the estimation procedure is here equivalent to the 
parametric identification of the manoeuvre model. 

20.6.1. Detection of manoeuvres 

Procedures of detecting manoeuvres in KF-based tracking filters are usually 
based on the analysis of the innovation process of the filter (Kerr, 1989). 
Typically the innovation process {^'[n]} is tested in a rectangular or weighted 
sliding window for the presence of a bias. Using a detection statistic 
calculated from the innovation process along with a detection threshold 
the manoeuvre detection instant rj that approves the hypothesis HM can be 
determined by the following detection algorithm: 

if /xdM > /^D then 

hypothesis HM is accepted 

rj = n (20.87) 

else 

hypothesis HM is rejected. 

In this work two such detectors will be presented and tested. Both detec- 
tors are based on the x^-distributed statistics /xd W- Apparently, by choosing 
the probability of false-manoeuvre alarms, one can describe the sensitivity of 
the detector and find the value of the threshold in appropriate tables. 

Approach 1: Extended input estimation detector 

The IE method presented in Section 20.5 can be used for the identification 
of the vector control signal. 

Based on the dynamic model 

x*[n + 1] = F[n -f l,n]x*[n] + B[n + l,n]a[n] + B[n + l,n]w[n], (20.88) 

and using the reasoning presented in Section 20.5, Park et al (1995) consider 
an Nm -element set of IE estimators: 

divM = G^^[n]hN[n], (20.89) 

for = 1, . . . , Nm. In the above set of estimators, the quantities G^ln] and 
hj^[n] can be calculated recursively: 

GjvM = Gjv_i[n - 1] + 

hN[n] = /ijv-i[n - 1] + il^J[n^,no]u}~^[n]i>[n], 



(20.90) 

(20.91) 
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for N = = 02x2, hj^o[n] = 02xi, where the vectors 

defined by an equation similar to (20.74). This technique 
is referred to as an Extended Input Estimation (EIE) method. 

The equations (20.90)-(20.91) describe an algorithm recursive with re- 
spect to time n and estimator length N, which is interpreted as a relative 
(counted backwards from the current instant n) hypothetical time of the 
beginning of the manoeuvre. The EIE algorithm allows estimating both the 
start instant no and the intensity of the manoeuvre. 

For the detection of a non-zero control signal, in each iteration of the base 
KF, the following set of statistics, for AT = 1, ... , N^, is computed: 

/iN[n] == d^[n]GAr[n]dAr[n], (20.92) 

which can be approximated by "distributed random variables with 2 de- 
grees of freedom. The maximum statistic 

MdM = max {/^Ar[n]} (20.93) 

iV=l . .,Nnn 

can be confronted with the corresponding value of the threshold in the 
detector (20.87). 

Approach 2: Virtual dimension filter detector 

The second detector of the occurrence of a manoeuvre is based on the method 
proposed by BAr-Shalom and Birmiwal (1982) for the Virtual Dimension 
Filter (VDF). In each estimation step, a fading memory average of successive 
innovation-process products of the KF is computed: 

PB[n] — apuin - 1] + S[n], (20.94) 

with 

S[n] == jy^[n]cv~^[n]iy[n], (20.95) 

where 0 < a < 1. The product S[n] can be approximated by a x^-distributed 
random variable with dy degrees of freedom. As 

lim E[/iDM] = , = Nody, (20.96) 

the quantity 

Nd = (1 - (20.97) 

can be interpreted as the effective window length, over which the presence of 
a manoeuvre is tested. 

Because a can be chosen so as to obtain an integer-valued Nd G /, the 
detection statistics //d can be approximated by a x^ "distributed random 
variable with Nod^, degrees of freedom. 
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20.6.2. Isolation of manoeuvres 

If the detector (20.87) accepts the hypothesis HM, meaning that the object 
is manoeuvring, the isolation of the manoeuvre type is required to modify 
(adapt) the estimation process properly. 

A procedure proposed for the qualification of the detected manoeuvre to 
one of the classes considered utilises the IE algorithm and is based on the 
following reasoning. If the identification algorithm, based on the model CCT 
associated with the hypothesis H2, is used for the estimation of the normal 
acceleration of an object whose trajectory corresponds to the hypothesis H2, 
the resulting estimate will, certainly, significantly differ from zero. If the same 
algorithm, based on the model associated with the hypothesis H2, is used 
for the estimation of the normal acceleration of an object whose trajectory 
corresponds to the hypothesis HI (CCA model), the obtained estimate is 
supposed to be close to zero. It is clear that analogous reasoning concerning 
the IE algorithm based on the model CCA associated with the hypothesis 
HI can be described in the same way. 

Therefore, we propose to construct two statistics, based on the estimates 
of the tangential and normal accelerations, which allow testing the statistical 
significance of the estimates. Expertise based on these statistics makes it 
possible to accept either HI or H2. It means that the detected manoeuvre 
corresponds to either (20.36) or (20.44). 

Non-linear input estimation 

If the detector (20.87) accepts the hypothesis HM, meaning that the object 
is manoeuvring, the estimate Ui,iv[^] of the tangential acceleration can be 
calculated according to the equation (20.79), for the assumed window length 
N and the necessary quantities taken from the KF in the last N iterations 
and stored in a computer memory. It is not, however, possible to obtain an 
accurate estimate of the normal acceleration in the same way, because the 
model (20.44) is non-linear. 

In order to cope with the non-linearity of the model (20.44), the normal 
acceleration U 2 can be estimated by using the following iterations performed 
Km-times: 

0 . 2 , n[v, fc] = - 1]) , (20.98) 

for k = l,...,Km and fl 2 ,iv[^; 0 ] = ^ 2 ,ivW- The equation (20.98) de- 
scribes the set of N estimators (20.79), resulting from the IE method and 
the model (20.44) (Sankowski and Kowalczuk, 2001). The factors g 2 ,N[n]k] 
and /i 2 ,iv[^;^] in successive iterations are computed using the following 
formulae: 



92,N[ri; k] = ^2,n[iT, k]^N[n]^2,N[iT, k], (20.99) 

h2,N[T, k] = ^2 ,n[v, k]ilN^[n]V nM, ( 20 . 100 ) 
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for = T] — N + i, i = 1, . . . ,N, and fc = 1, . . . , K^: 












( 20 . 101 ) 



i—1 






i-l-2 



b2[riiii,'nf-,k]. (20.102) 



1=0 L m=0 

The key element & 2 [^/+i 5 of the CCT model can be calculated 

according to either 






lonIv, k] 



-Tiui [r)f^; k] - 1 /uin[v, k] («2[»?/+i ; k] -W2[»?f ; fc]) 
*i[»7/+i;fc] -«ik^;fc] 

(20.108 
(20.104) 



for the exact discrete model CCTE, or 

^2 k+i ,r)l';k] = B , rjf^]u 2 [j?,^ ; fc] , 

for the approximate discrete model CCTA, both with 



Ui[Vi,k] = 



sin fc]) 

cos{i>[T](^;k])\ ’ 



(20.105) 



*2k^;fc] 



cos (^[r//^; fc]) 
-sin(^[7j,^;fc])_ ’ 



(20.106) 



; fc] = ^[Vo] + fc] XI (20.107) 

i=l 

= arclan (20.108) 

i^Nimk] = > (20.109) 

t)[7?^] = ^Jvl[T]^\r}^] + vl['n^\r^^]. (20.110) 

This iterative procedure is repeated until the changes of estimates 
obtained in successive k iterations become sufficiently small. 

As the elements of the vector (20.103) include the factors 

, it is not possible to start the iterative identification procedure with 
the initial values of atW — 0- Thus we propose using the rough estimates 
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(initial guess) j^[r]] of the normal acceleration Un- This can be done by 
performing the estimation (20.79) and the approximation of the influence of 
the normal acceleration: 



b2[n^+i,n^] ~ B[n + l,n]u2[no',k], (20.111) 



for z = 0, . . . , — 1, which has been obtained by linearizing the discrete-time 

model (20.50) (Kowalczuk and Sankowski, 2000a). The use of rough estimates 
of the normal acceleration has a signiflcant advantage of a faster convergence 
of the iterative estimation of (20.98) as compared to the use of ad-hoc values. 



Recognition of the manoeuvre class 

The classification of the object manoeuvre is performed on the basis of the 
two sets of statistics: 



= Si.ArWffi.TvM, (20.112) 

H2,N[n] = « 2 ,iv[^; Km]52,Ar[jj; K^], (20.113) 

each for N — 1, . . . ,Nm, which can be approximated by random variables 



(x^ -distributed with one degree of freedom). 

By introducing the following denotations: 

„ max (20.114) 

Ni = arg max {Mi.ivN). (20.115) 

l^f,N = „ max { 1 ^ 2 , nIv]}, (20.116) 

^2 = arg max {M 2 ,atW}, (20.117) 



the proposed decision procedure can be expressed as follows: 



if //M //M 

hypothesis HI is accepted 

else 

hypothesis H2 is accepted. 



(20.118) 
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The above isolation procedure can be extended in order to perform both 
the manoeuvre isolation and detection functions: 



^ ^ /^^iV 

hypothesis HI is accepted 

else if > /4 and 

hypothesis H2 is accepted 



(20.119) 



else 

hypothesis HM is rejected. 

As has been mentioned before, for a given probability of false alarms the 
value of the threshold /Xj can be found in the corresponding tables. The de- 
tector (20.119), henceforth referred to as a DIE detector, is useless in practice 
since it requires performing a complete calculation of the statistics (20.112) 
and (20.113) in each iteration of the base Kalman filter. Because the sets of 
statistics (20.112) and (20.113) are equivalent to the outputs of a set of filters 
matched with the expected manoeuvres, the DIE detector’s properties will 
be tested for a concluding reference. 



20.6.3. Identification of manoeuvre model parameters 

After the detection of the manoeuvre at the instant rj using the detec- 
tor (20.87) and the isolation of the trajectory type Rj according to the 
rule (20.118), the parameters of the manoeuvre model can be straightfor- 
wardly estimated. 

Case 1: The hypothesis HI based on the model CA is declared to be true 

In the case of the detection of a uniform speed change the following param- 
eters of the manoeuvre can be estimated: 

• the manoeuvre onset time no[r]]: 



no[rj]=rj-Nu (20.120) 

where Ni (20.115) is an optimal estimate of the relative time instant of the 
start of the manoeuvre. 

• the tangential acceleration di[rj] and the corresponding covariance of 
the estimation error cTi[r]]: 



^i[v] ~ ( 20 . 121 ) 

( 20 . 122 ) 

where the quantities d^ and cr^ ^ are calculated according to the equa- 
tions (20.79) and (20.86), respectively, for N = Ni. 
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Case 2: The hypothesis H2 based on the model CT is declared to be true 

In the case of the detection of the standard turn, the following parameters of 
the manoeuvre can be estimated: 

• the manoeuvre onset time no[rj]: 



no[r)] =t]-N 2 , 



(20.123) 



where N 2 (20.117) is an optimal estimate of the relative time instant of the 
beginning of the manoeuvre; 

• the normal acceleration 62 [ry] and the corresponding covariance of the 
estimation error cr^irj]: 



— ^2,iV2 



(20.124) 



\[ri] = 






(20.125) 



with the values of ^2 [^5 and [^5 ^m] in the above equations being 

computed according to the equations (20.98) and (20.99), respectively; 



• the velocity v[rj]: 



v[t]] = i)l [no W I no W] + [^o M | no [v ]] ; (20.126) 

• the radius r[ry] of the manoeuvre circle: 

f[,l = (20.127) 

• the turn rate cj[rj]: 

(20.128) 

The motion parameters used in the above formulae are obtained in the 
following way: ^^[no[77]|no[r;]] and y[ho[rj]\ho[rj]] are the estimates of the target 
position, while Vx [no [rj]\no [rj]] and Vy [ho [rj] \ho [ry]] are the estimates of velocity 
taken from the state vector estimate of the base KF for the time instant ho[ri\. 

Note that Ni and N 2 are two hypothetical values of an integral quan- 
tity N referring to the estimated relative onset time instant of the analysed 
manoeuvres. 
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20.7. Evaluation of the proposed methods 

In this section an analysis of certain properties of the detection, isolation 
and estimation algorithms is highlighted. This analysis focuses mainly on the 
properties of the exact CCTE and approximate CCTA models of coordinated 
turns, the evaluation of the effectiveness of the three analysed manoeuvre 
detectors and the manoeuvre isolation procedure. Moreover, the accuracy 
of least-squares methods proposed for the identification of the tangential 
arid normal accelerations (as the control signals) is examined. Additional 
goals of this study are an analysis of the sensitivity of DIE algorithms to 
their parameterisation and the resulting determination of the values of these 
parameters. 



20.7.1. Properties of models of the standard turn 

In order to find a relationship between the two CCTE and CCTA models 
of coordinated turns, normalised errors are introduced that characterise the 
discrete model (20.44)-(20.45) when the approximate CCTA model (20.50) 
is used instead of the discretized CCTE one (20.47). 

Consider the solutions of the respective difference equations (Kowal- 
czuk and Sankowski, 2000b) p^^™[no] and and 

where the upper indices denote the analysed models of coordi- 
nated turns. By subtracting we obtain a cumulative 

position error, which can be normalised with respect to both the radius r of 
the manoeuvre circle and the manoeuvre time period Uq — no as 

5p[no,n'Q,w,T] = ^ 



{^Tf 

2(n'o - no) 



Tin — 1 



U2[i] + wTui[no] + - U2[no]|. (20.129) 



i=no 



Similarly, by subtracting u^^^^[no] — a cumulative velocity 

error is obtained that can be normalised with respect to both the object 
velocity v and the manoeuvre time period Uq — tiq as 



Sv[no,nQ,(jj,T] = 



1 

u(n() - no) 



{v 



CCTAr 



„CCTE 



K]} 



h'q - no 



n.g-1 

«2[j] 



i=riQ 



(20.130) 

^0 ~ ^0 ^ ^ 



for no > no. 
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It can be easily shown that the errors defined above have the following 
asymptotic properties: 



lim Sp[no,nQ,uj,T] — 02xi, (20.131) 

cjT— >0 

lim Sv[no,riQ,u),T] = 02x1- (20.132) 

ujT — >^0 

Two trajectories generated using the CCTE and CCTA models, which 
qualitatively show the effect of a mismatch between the models for two sce- 
narios with different manoeuvre intensities, are presented in Figs. 20.9(a) 
and 20.9(b). The trajectories are characterised by the following motion 
parameters: v — 100 [m/s], = T = 4 [s], = 2.5 [m/s^] (a) and 

a„ = 6 [m/s^] (b). 




Fig. 20.9. Mismatch between trajectories generated using 
the exact CCTE and approximate CCTA models 

Based on the relations given in (20.129)-(20.132), Figs. 20.9, and the anal- 
ysis done by Kowalczuk and Sankowski (2000b), we can now conclude that 
the longer the sampling perior and/or the more intensive the circular ma- 
noeuvre, the bigger the mismatch between the exact CCTE and approximate 
CCTA models. 

20.7.2. Simulation tests 

The efficiency of the proposed methods has also been tested using simula- 
tion. Suitable runs have been performed based on testing scenarios recom- 
mended by EUROCONTROL (1993) for the tracking of a target in a Major 
Terminal Area by means of a Primary Surveillance Radar. Among other 
details the document defines the uniform motion as a movement of a tar- 
get, influenced by the tangential and normal accelerations at < 0.01 [m/s^j 
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and ttn < 0.1 [m/s^], which characterises the range of civil aircrafts veloc- 
ities = 20, ...,410 [m/s]) and the expected intensities of manoeuvres: 
at = 0.3, . . . , 1.2 [m/s^] and an — 2.5, . . . , 6 [m/s^]. 

In the following simulation, on the basis of EUROCONTROL (1993), two 
testing scenarios (target trajectories) are used: 

1. The target moves along a straight line trajectory at a constant speed 

V — 154 [m/s] (HO) up to the time instant to = 160 [s], at which it 
starts to accelerate (HI) with a tangential acceleration at = 1 [m/s^]. 

2. The target moves along a straight line trajectory at a constant speed 

V = 154 [m/s] (HO) up to the time instant to = 160 [s], at which it 
starts to turn (H2) with a normal acceleration an = 4 [m/s^]. 

A radar, rotating with ART=4 [s], is assumed to provide unbiased mea- 
surements of the target position in polar RA coordinates perturbed by ad- 
ditive Gaussian-distributed uncorrelated measurement errors in range q 
and azimuth a described by their standard deviations ag = 120 [m] and 
(7 a = 0.15 [deg], respectively (see Section 20.2). 

Based on the two scenarios defined above, two sets (100 elements each) of 
testing tracks have been generated for both testing trajectories, characterised 
by statistically independent realizations of the measurement process. These 
tracks have been used as the testing data for the tracking algorithm, the 
results of which are presented in the following subsections. 

It has been assumed that the input noise covariance matrix W = 
2^diag{l, 1}, which results from the assumed target dynamics. The covari- 
ance matrix R of the measurement errors refiects the assumed standard 
deviations of the slant range and azimuth measurements. 

20.7.3. Parametric identification of manoeuvre models 

In this subsection the evaluation of GLS-based parametric identification pro- 
cedures (IE) and Non-linear Input Estimation (NIE) are presented. This in- 
cludes a statistical analysis of estimates of the manoeuvre onset time instant 
no and of the tangential at and normal an accelerations for different esti- 
mator memory lengths. 

In order to evaluate the accuracy of the estimator of the manoeuvre onset 
time instant no, the minimum, average and maximum estimation errors are 
presented that have been measured in the terms of sampling periods [Sa] . This 
allows inspecting the probability distributions of the errors in a way similar 
to the standard analysis of histograms. The estimation errors of accelerations 
are described by their mean value (bias) and standard deviation (RMS error) . 

Table 20.2 shows the estimation errors of the manoeuvre onset time in- 
stant, while Table 20.3 presents the biases and RMS errors of the estimates of 
the tangential acceleration for the simulation scenario 1 and different memory 
lengths A' == 1, . . . , 10 of the IE estimator (20.79). 
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Table 20.2. Estimation errors of the manoeuvre onset time instant no (for the 
scenario 1) obtained from the set of IE estimators based on the CCA model, in [Sa] 



Error 


Estimator memory length N 


fio — no 


1 


2 


3 


4 


5 


6 


7 


8 


9 


10 


min. 


0.00 


0.00 


0.00 


0.00 


0.00 


0.00 


0.00 


0.00 


0.00 


0.00 


avg. 


0.00 


0.54 


0.75 


1.05 


0.87 


0.68 


0.24 


0.10 


0.03 


0.00 


max. 


0.00 


1.00 


2.00 


3.00 


4.00 


5.00 


6.00 


2.00 


2.00 


0.00 



Table 20.3. Biases (mean) and standard deviations (std., RMS errors) 
of the estimates of the tangential acceleration di obtained from the set of 
IE estimators based on the CCA model (for the scenario 1), in [m/s^] 



Error 


Estimator memory length N 


ai — CLt 


1 


2 


3 


4 


5 


6 


7 


8 


9 


10 


mean 


2.01 


0.13 


-0.03 


0.03 


0.01 


0.02 


0.01 


0.00 


0.00 


0.00 


std. 


16.84 


3.81 


1.83 


0.85 


0.51 


0.39 


0.29 


0.20 


0.15 


0.12 



Table 20.4 shows the estimation errors of the manoeuvre onset time in- 
stant, while Table 20.5 presents the biases and RMS errors of the estimates 
of the normal acceleration for the simulation scenario 2 and different memory 
lengths A' == 1, . . . , 10 of the NIE estimator (20.98). 



Table 20.4. Estimation errors of the manoeuvre onset time instant no 
(for the scenario 2) obtained from the set of NIE estimators based on the 
two CCT models, in [Sa] 



Error 


Estimator memory length N 


no — no 


1 


2 


3 


4 


5 


6 


7 


8 


9 


10 


CCTE model 


min. 


0.00 


0.00 


0.00 


0.00 


0.00 


0.00 


0.00 


0.00 


0.00 


0.00 


avg. 


0.00 


0.32 


0.33 


0.06 


0.01 


0.00 


0.00 


0.00 


0.00 


0.00 


max. 


0.00 


1.00 


2.00 


2.00 


1.00 


0.00 


0.00 


0.00 


0.00 


0.00 


CCTA model 


min. 


0.00 


0.00 


0.00 


0.00 


0.00 


0.00 


0.00 


0.00 


0.00 


0.00 


avg. 


0.00 


0.32 


0.29 


0.06 


0.01 


0.00 


0.00 


0.00 


0.00 


0.00 


max. 


0.00 


1.00 


2.00 


2.00 


1.00 


0.00 


0.00 


0.00 


0.00 


0.00 
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Table 20.5. Biases (mean) and standard deviations (std., 
RMS errors) of the estimates of the normal acceleration 0,2 ob- 
tained from the set of NIE estimators based on the two analysed 
CCT models (for the scenario 2), in [m/s^] 



Error 


Estimator memory length N 


Qj 2 On 


1 


2 


3 


4 


5 


6 


7 


8 


9 


10 


CCTE model 


mean 


1.90 


0.41 


0.17 


0.13 


0.08 


-0.03 


-0.03 


-0.03 


-0.01 


0.00 


std. 


19.09 


4.24 


1.55 


0.97 


0.62 


0.38 


0.29 


0.24 


0.17 


0.13 


CCTA model 


mean 


1.59 


0.46 


0.18 


0.14 


0.10 


-0.01 


-0.01 


-0.01 


0.01 


0.02 


std. 


18.83 


4.28 


1.55 


0.98 


0.62 


0.39 


0.30 


0.25 


0.18 


0.14 



As results from Tables 20.2-20.5, we can state that both estimators are 
consistent because their estimation errors decrease when we increase the 
amount of the processed data (estimator memory length) . The relatively large 
estimation errors in the case of the scenario 1 (Tables 20.2 and 20.3) can be 
observed for the estimator lengths N < S. This is due to small intensity of 
this manoeuvre (and a low signal-to-noise ratio). 

The indices presented in Tables 20.4 and 20.5 show a slightly better qual- 
ity of the iterative NIE algorithm based on the exact CCTE model of the 
standard turn as compared to the analogous algorithm based on the approxi- 
mate model CCTA. However, the difference is not as big as could be expected 
from the analysis of the properties of the models alone (see Subsection 20.7.1). 

It is obvious from Tables 20.2 and 20.4 (see the zero-error columns) that 
longer horizons {N) of observation assure perfect onset time estimation. 



20.7.4. Manoeuvre recognition 

Tables 20.6 and 20.7 show the estimates of the efficiency of the isolation 
algorithms based on the two analysed discrete CCT models for the testing 
scenarios 1 and 2, respectively. As the quality index the percentage of correct 
isolations has been chosen. 

Properties parallel to those mentioned in the previous subsection can 
be observed in Tables 20.6 and 20.7. Namely, the efficiency of the isolation 
procedure against the HI manoeuvre (the scenario 1) is poor for N <8. Here 
this effect can also be connected with the small intensity of the manoeuvre. 

Additionally, it is worth noticing that the impact of the type of the CCT 
model used in the isolation procedure does not seem to be significant. 
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Table 20.6. Efficiency of the isolation of the uniform speed change 
(for the scenario 1) by means of the set of NIE algorithms based on the 
CCTE and CCTA models, in [%] of correct classifications 



CCT model 


Estimator memory length N 


1 


2 


3 


4 


5 


6 


7 


8 


9 


10 


CCTE 


52 


52 


55 


63 


71 


82 


95 


99 


100 


100 


CCTA 


51 


52 


53 


61 


72 


83 


95 


99 


100 


100 



Table 20.7. Efficiency of the isolation of the standard turn (for the 
scenario 2) by means of the set of NIE algorithms based on the CCTE 
and CCTA models, in [%] of correct classifications 



CCT model 


Estimator memory length N 


1 


2 


3 


4 


5 


6 


7 


8 


9 


10 


CCTE 


47 


63 


82 


100 


100 


100 


100 


100 


100 


100 


CCTA 


47 


63 


82 


100 


100 


100 


100 


100 


100 


100 



20.7.5. Manoeuvre detection 

The analysis of the efficiency of the detectors considered requires evaluat- 
ing their properties with respect to the assumed types of manoeuvres and 
different values of probabilities of false manoeuvre alarms Pfa used for de- 
termining the detection threshold . The basic attributes of the manoeuvre 
detectors examined below include the level of the generated false manoeuvre 
alarms and the reaction time of the detector (referred to as a delay). 

In the following tests a dead-zone Nm has been introduced into the EIE 
and DIE manoeuvre detectors in order to eliminate possible detections based 
on the shorter estimator lengths N < Nm- The value of — 3 has been 
chosen based on the analysis of these estimation and isolation procedures 
which appeared to be unacceptable for N < 3. Similarly, the parameter a 
of the VDF detector has been set to a value which results in an effective 
window length Nd == 4. The maximum memory lengths of the EIE and DIE 
estimators have been fixed to Nm = 15. 

Table 20.8 presents the percentage of the processed tracks affected by 
false manoeuvre alarms for four different values of Pfa- 

A short analysis of the simulation results given in Table 20.8 shows that 
the VDF detector is characterised by the lowest level of false manoeuvre 
alarms for the same value of Pfa- This property is especially noticeable 
for Pfa = 10“^. Such large differences of the false manoeuvre alarm rate 
between the three detectors are due to different functional structures of the 




814 



Z. Kowalczuk and M. Sankowski 



Table 20.8. Percentage of the tested trajectory realisations which 
caused false manoeuvre alarms, in [%] of affected tracks 



Detector 


Probability of false alarms PpA 


type 


10~^ 




10“^ 


10“® 


10“® 


VDF 


5.5 


OJO 


0.0 


0.0 


0.0 


EIE 


29.0 


M 


M 


0.0 


0.0 


DIE 


42.0 


8.5 




0.0 


0.0 



detectors. The VDF detector tests the innovation process for the presence of a 
bias using the fading memory mechanism, while its operation can be related 
to the corresponding effective window length Nd = 4 . The EIE detector 
tests the innovation process using independent estimators matched with the 
Nm “ Nm — 12 manoeuvres HI with different durations. The DIE detector 
is functionally similar to the EIE detector except that it is equivalent to 
12 estimators matched with the HI manoeuvres plus 12 estimators matched 
with the H2 manoeuvres of different durations. 

Tables 20.9 and 20.10 show the minimum, average and maximum delays 
of manoeuvre detection for the scenarios 1 and 2, respectively. 



Table 20.9. Delay rj — no of the detection of the 
uniform speed change (the scenario 1), in [Sa] 



Delay 


Probability 


of false alarm 


Pfa 


r? - no 


10“^ 


10“® 


10"^ 


10”^ 


10“® 


VDF detector 


min. 


5.0 


6.0 


7.0 


7.0 


7.0 


avg. 


7.8 


too 

|bo 


9.3 


9.9 


10.3 


max. 


10.0 


11.0 


11.0 


12.0 


12.0 


EIE detector 


min. 


4.0 


5.0 


5.0 


6.0 


7.0 


avg. 


6.6 


Li 


M 


8.4 


8.9 


max. 


10.0 


10.0 


10.0 


10.0 


10.0 


DIE detector 


min. 


4.0 


5.0 


5.0 


6.0 


7.0 


avg. 


6.6 


7.1 


L8 


8.2 


8.6 


max. 


10.0 


10.0 


10.0 


10.0 


10.0 
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Table 20.10. Delay t] — no of the detection 
of the standard turn (the scenario 2), in [Sa] 



Delay 


Probability of false alarm Pfa 


0 

1 


10“^ 


10"® 


10“^ 


10“® 


10"® 


VDF detector 


min. 


4.0 


4.0 


4.0 


4.0 


5.0 


avg. 


4.4 


48 


5.1 


5.3 


5.7 


max. 


5.0 


6.0 


6.0 


6.0 


6.0 


EIE detector 


min. 


4.0 


4.0 


4.0 


4.0 


4.0 


avg. 


4.4 


4.6 


iA 


4.9 


5.1 


max. 


6.0 


6.0 


7.0 


7.0 


7.0 


DIE detector 


min. 


4.0 


4.0 


4.0 


4.0 


4.0 


avg. 


4.5 


4.5 


4J 


4.8 


5.0 


max. 


6.0 


6.0 


6.0 


7.0 


7.0 



As can be observed from Tables 20.9 and 20.10, the structural (and func- 
tional) difference between the VDF and EIE/DIE detectors strongly affects 
the reaction times of the detectors. The average delays of the detection of 
the HI manoeuvre (see Table 20.9) introduced by the EIE/DIE detectors 
are approximately 1.5 [Sa] (6 [s]) smaller than those characterising the VDF 
detector for the same value of Pfa- 

This property is less noticeable in the case of the detection of the H2 
manoeuvre (see Table 20.10) because the VDF detector, characterised by a 
relatively short effective memory length Nd = 4, fits better manoeuvres of 
larger intensity (the scenario 2). 



20.7.6. Optimal window length of the lE/NIE estimator 

The accuracy of the identification of manoeuvre parameters (onset time in- 
stant and intensity) as well as the reliability of the manoeuvre isolation pro- 
cedure improve with an increase in the lE/EIE estimator memory length 
N, as shown in Subsections 20.7.3 and 20.7.4. Unfortunately, due to a short 
reaction time (adaptivity) of the state estimators required in radar tracking 
applications, the length N of the lE/NIE estimator cannot be set arbitrarily. 
This can be interpreted as a typical problem of identification, which lies in a 
compromise between the adaptivity and accuracy of the estimator. 

In fact, the necessity of coping with both a small intensity (HI) and a large 
intensity (H2) of manoeuvres by the identification procedure precludes the use 
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of any manoeuvre detector characterised by a fixed effective-memory length 
(e.g. the VDF detector). It is more promising to use instead a bank of filters 
matched with the expected spectrum of manoeuvres (Lanka, 1984; McAulay 
and Denlinger, 1973), as it is done in the case of the analysed EIE-based 
detector (Park et a/., 1995). 

In this case, the desired window length of the lE/NIE estimator should 
be selected based on the type of the manoeuvre, its intensity and the value of 
the detection threshold /i^ . The adaptive way in which we have constructed 
the estimators of the onset time moment permits the (instrumental) deter- 
mination of the relative onset time, which can also be interpreted as the 
optimal value of the length of the analysed estimator (lE/NIE). The value 
Nj, j G {1,2} is determined according to (20.120) or (20.123). 

Since N = Nj, for a suitably chosen j by (20.118) or (20.119), is supposed 
to be close to a delay of manoeuvre detection, the less intensive the observed 
manoeuvre, the longer the value of N (this results from an adaptive choice 
of the estimator length). In other words, the manoeuvre is detected only if 
it causes a sufficiently large bias in the innovation process of the base KF. 
In such a case we claim that the informative content of the bias resulting 
from the manoeuvre is sufficient for the estimation of the parameters of the 
assumed manoeuvre model. Therefore, we also approve here the truth that in 
engineering the effective memory length N should approach an optimum in 
the sense of a trade-off between the adaptivity and accuracy of the parametric 
identification procedure. 



20.8. Summary 

In this chapter a new DIE method has been proposed for tracking civil air- 
crafts which allows detecting target manoeuvres, isolating the best-fitted ma- 
noeuvre model and performing the parametric identification of the assumed 
models. This approach utilises analytical models of target trajectories in- 
cluding: a model of uniform motions with a constant velocity, a model of 
uniform speed changes (with a controlled constant acceleration, CCA) and a 
model of standard turns (with a controlled coordinated turn, CCT). When 
a Kalman filter based on the CV model is used for tracking a manoeuvring 
object, the innovation process of this KF gets biased. The analytical models 
of manoeuvres are used for modelling the bias. They also constitute the basis 
for suitable identification procedures (lE/NIE). 

The proposed method of detection-isolation-estimation allows designing 
an adaptive multiple-model state estimator based on switched Kalman filters, 
in which only one of the analysed models is assumed to be the correct one 
for a given time interval. 

The presented simulation experiments have shown suitability for the pro- 
posed algorithms. It has been observed that iterative NIE estimators based 
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on the analysed CCTE and CCTA models converge quickly when used with 
data matched with the CCT model. With this, both the IE and NIE estima- 
tors are consistent. The proposed analysis of the accuracy of NIE algorithms 
based on the accurate CCTE and approximated CCTA models has shown 
that the effect of the applied model on the overall performance of the esti- 
mator considered is rather insignificant. 

The two detectors VDF and EIE have been examined with respect to the 
reliability of the detection of the two analysed manoeuvre types (HI and H2), 
the reaction time (delay), and the level of false manoeuvre alarms. On the 
basis of the performed simulation and a short discussion of the properties 
of these detectors we can conclude that the EIE detector, characterised by 
Pfa = 10“^, . . . , 10“^, establishes a solid background for practical applica- 
tions. 
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Chapter 21 



DETECTING AND LOCATING LEAKS 
IN TRANSMISSION PIPELINES 

Zdzistaw KOWALCZUK*, Keerthi GUNAWICKRAMA* 



21.1. Introduction 

Large pipeline networks are widely used to transport various fluids (liquids 
and gases) from production sites to consumption ones. Transmission pipelines 
are the most essential parts of such networks. They are used to transport 
fluids at very long distances, which may vary in length from several kilometres 
to hundreds or thousands of kilometres. They have few branches and no 
loops, operate at relatively high pressure and need compressor stations, which 
generate the energy the fluid needs to move through the pipeline. Despite 
huge initial establishment costs, in the long run transmission pipelines are 
convenient and economical for the transportation of large volumes of fluid. 
In various industries transmission pipelines are often compulsory. 

The safe handling of such networks is of greatest importance due to seri- 
ous consequences which may result from faulty operations. Reports indicate 
that in spite of strict regulations and enforcements (API, 2000; Muhlbauer, 
1992), the frequency of pipeline accidents has not changed signiflcantly over 
the last two decades (Hovey and Farmer, 1999). In particular, leaks appear 
quite frequently due to material ageing, pipe burst, the cracking of the weld- 
ing seam, hole corrosion, and unauthorised actions by third parties. They 
can cause human casualties, environmental catastrophes, and product losses. 
Any leak of a fluid (such as crude oil, natural gas, or petrochemical products) 
can bring about a dangerous explosion. The emission of toxic gases through 
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leakage may have a serious impact on human life and environment. In addi- 
tion to these losses, restoring the environmental balance is usually expensive. 
Therefore, transmission pipelines need to be under permanent observation for 
an early detection of leaks, so that most negative effects can be minimised 
through a proper counteraction, which can be an early and safe shutdown 
of the pipe, a rapid dispatch of crews for inspection and cleanup, etc. An 
effective and appropriately implemented Leak Detection System (LDS) pays 
off through a reduced spill volume and an increase in public confidence. 

Any LDS system performs leak detection, according to diagnostic prin- 
ciples (see Chapter 5), by means of the following fundamental functions: 

• the leak alarm decision, disclosing any unintended product release 
{leak) which has actually occurred, 

• an immediate determination of the leak size and location, meaning the 
system capability to isolate the place along the pipeline where the product 
breaches the pipe wall {leak location)^ and to identify the amount of the 
product release or the rate of it {leak size). 

The reliability of triggered alarms is a crucial attribute of an LDS be- 
cause false alarms yielded under leak-free operations generate extra work for 
servicing personnel, reduce the confidence that operators have in the system, 
and rise the chance of overlooking a real leak (Zhang, 1997). Other important 
factors are the smallest^ leak flow rate that can be detected and the time it 
takes to rise the alarm. Precise information on the leak location and the leak 
size {leak parameters) is most valuable for a proper assessment and action. 

Pipelines are often subjected to unsteady operating conditions {oper- 
ational statuses, transients of flow and pressure), which arise from valve 
changes, enforced throughput shifts (pump starts or shut-downs), and pig- 
ging, for instance. Transients also appear when a pipeline is not entirely filled 
up with a product. It is important that apparently similar effects are created 
during the occurrence of a leak. Therefore, the LDS system should differen- 
tiate between the varying operational conditions and the leaks (Kowalczuk 
and Gunawickrama, 2001). 

In view of the above, the quality of leak detection understood as the 
effectiveness of the LDS, primarily concerned with performance-related issues, 
can be assessed by scanning the following system attributes (API, 1995): 

• sensitivity, determined as a composite measure of the minimum size 
of leaks which can be detected, and the time required to yield the alarm in 
that case; 

• accuracy, defined as a magnitude relating to the estimated parame- 
ters^ of the flow rate and location of the leak: in practice, estimates found 
with an acceptable degree of tolerance (a percentage of their real values) are 
considered accurate; 

^ The size of leaks smaller than 1[%] of the average flow rate shall be here considered 

small. 

^ Often, a total volume loss is also of interest (API, 1995). 
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• reliability^ taken as the system’s capacity to render accurate decisions 
about the existence of a leak and directly related to the probabilities of detect- 
ing the existent leaks and making incorrect declarations {false alarms^): note 
that by using additional information to disqualify, limit, or inhibit alarms, 
the rate of leak declarations may be diminished; 

• robustness, meaning a measure of the LDS ability to continue its func- 
tioning and providing useful information under substantial and varying op- 
erating conditions or, at least, to distinguish between the normal transient 
operations and real leaks. 

The LDS system can also be expected to have other desirable util- 
ities (API, 1995), such as allowing a well-timed detection of product re- 
leases; servicing all types of gases and liquids; being configurable to com- 
plex pipeline networks; providing product measurements and inventory com- 
pensations (i.e., temperature, pressure and density) for possible corrections; 
supplying a pipeline real-time pressure profile; being redundant (including 
several leak detection techniques which function in parallel); accommodating 
slack-line and multiphase fiow conditions; having dynamic alarm thresholds 
and line pack constant; accounting for the effect of a drag-reducing agent; 
performing accurate imbalance calculations on fiow meters; assisting with 
product blending; permitting the heat transfer; requiring minimum software 
and hardware configuration and tuning. 

In general, methods used to detect product leaks in pipelines can be 
divided into two categories (ADEC, 1996): 

• external (direct) methods, which detect the product leaking outside 
the pipeline and include traditional procedures such as the right-of-way in- 
spection by line patrols and technologies like hydrocarbon sensing via fibre 
optic or dielectric cables; 

• internal (inferential or analytical) methods, also known as Computa- 
tional Pipeline Monitoring (CPM), which use instruments to monitor certain 
internal pipeline quantities (such as pressure, fiow, temperature, etc.), which 
are inputs to a CPM procedure, and infer an existing product release from 
suitable signal computations. 

As opposed to the external approach, the internal methods can be quite 
sensitive to considerably small leaks and have the ability of detecting those 
rapidly and accurately. However, developing reliable internal leak detection 
methods is a challenging task due to large dimensions of pipelines, system 
nonlinearity, and the existing operational constraints. What is more, the num- 
ber of the available measurement points is strongly limited, and the system’s 
internal states are usually unknown (Marques and Morari, 1988). 



3 



See, for instance, Chapter 20. 
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In this chapter we shall consider two analytical methods of an effective 
detection of leaks in transmission pipelines: 

i) a method sensitive to faults (based on modelling and correlation tech- 
niques), which shall be referred to as the Fault Sensitive Approach (FSA), 
and 

ii) a method using a model of faults and a Kalman correction, which will 
be called the Fault Model Approach (FMA), 

which are based on nonlinear mathematical models of the transported fluid 
flow and nonlinear observation techniques, making use of the of the pressure 
and flow measurements at the boundaries of the monitored pipeline. 



21.2. Transmission pipeline process 

To assist the discussion that will follow, let us first describe the process of 
transporting a fluid (a gas or a liquid) through a pipeline of given character- 
istics. 



21.2.1. Pipe instrumentation 

A typical transmission pipeline consists of sources, loads and pumping sta- 
tions, and pipelegs linking all these elements. The sources collect the fluid 
from various wells or other pipelines. The fluid is delivered through the 
pipeline to costumer sites, which can be treated as loads. The energy neces- 
sary to move the fluid through the pipe is generated by various devices, such 
as pumps, motors or compressors, which are located at pumping stations. 

A typical instrumentation of a transmission pipeline along with the mea- 
suring and control devices is shown in Fig. 21.1. 



Pump Valve Sensors j 
Y 

Pipe Inlet 



PexQe: 



]^\\S — 


InMiri — 


. Tank 


KH|H 


1 r>iKi 


■^(Load) 



Pipeleg 



\^Sensors Valve j 
Y 

Pipe Outlet 



I 1 V Z (distance) 

0 z 

Fig. 21.1. Instrumentation of a pipeline and its measurements: pressure 
at the inlet Pin and the outlet Pex\ flow rate at the inlet Qin and the 
outlet Qex] distance from inlet 2 :; pipe length Z 
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As opposed to most industrial plants, transmission pipelines are not well 
instrumented and they usually have only few technological points, up to fifty 
or more kilometres distant. Such points are often equipped with sensors for 
pressure (transducers), fiow rate (fiow meters) and temperature used for the 
supervision and control of the process. 



Table 21.1. Parameters of a leaking transmission pipeline 



Parameter 


Symbol 


Unit 


Transmission pipe 


Axial coordinate of distance in the pipe 


z 


[m] 


Axial coordinate of the height of the pipe 


h 


N 


Time coordinate 


t 


N 


Fluid pressure 


P 


[N/m^l 


Fluid mass 


m 


[kg] 


Fluid mass flow rate 


q = dm/dt 


[kg/s] 


Fluid velocity 


w 


[m/s] 


Fluid density 


P 


[kg/m*] 


Isothermal velocity 
of sound in the fluid 


C= \fpfp 


[m/s] 


Fluid friction (resistance) coefficient 


A 


— 


Pipe length 


Z 


[m] 


Pipe diameter 


D 


[m] 


Cross sectional area of the pipe 


A = 


[m*] 


Angle of inclination 


a 


[rad] 


Fluid leak 


Leak location 


zl 


[m] 


Leak size (in terms of the mass flow rate) 


QL 


[kg/s] 


Moment of the leak occurrence 


tL 


[s] 


Other parameters 


Gravity acceleration 


9 


[m/s*] 


Sampling interval of measurements 


Ts 


— ! 



21.2.2. Technical parameters of the pipe 

The pipeline and leakage that are under consideration can be characterised by 
the parameters listed in Table 21.1. Certain parameters, such as the pressure 
or fiow rate, are space- and time-dependent variables. Therefore, superscripts 
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will be used to identify the time stamp of a variable and subscripts - to denote 
its space location. For instance, pressure in the location 2 ; and at the time 
t will be represented as p\. 

21.2.3. Technological effects of leaks on pipeline measurements 

When the pipeline is operating under stationary steady-state conditions (a 
status with no leakage), the pressure and flow rate of the fluid at any given 
point along the pipeline are almost constant in time. The pressure drop along 
the pipeleg is approximately a linear function, and flow rates through the inlet 
and outlet are approximately equal (Farmer, 1989). 

With the occurrence of a leak, a sudden pressure drop develops at the 
point of the leak location resulting in a rapid re-pressurisation wave, known 
as the low pressure rarefaction wave (negative pressure, or expansion wave). 
It contains the foremost information about the leak and propagates to both 
directions away from the leak location along the pipeline at the velocity of 
sound speciflc for a given fluid (ADEC, 1996; Farmer, 1989; Zhang, 1997). 

Thus, theoretically, the influence of the leak should be read by means of 
measuring devices after an interval of time related to the distance from the 
place of the leak to the sensor and the velocity of sound of the fluid. This 
time interval determines the minimum time necessary for detecting the leaks. 

The characteristic behaviour in reaction to the leak occurrence is followed 
by signiflcant changes in the mass flow and pressure along the pipeline and, 
consequently, in all measurements. In general, the changes, as compared to 
the stationary steady state prior to the leak occurrence, are as follows (Siebert 
and Klaiber, 1980): 

• mass flow at the inlet Qin increases (/^), 

• mass flow at the outlet Qex decreases (\), 

• pressure at the inlet Pin can slightly decline (\), 

• pressure at the outlet Pex can also slightly decline (\). 

Thus, following the rarefaction wave, the pipe dynamics transform to a 
new stationary steady state. Namely, the pressure drop is at its maximum 
near the leak position and decreases along with the distance from the leak 
location. The constant inclination breaks into two, and makes two different 
curves with different inclinations which intersect at the spot of the leakage 
(Fig. 21.2(a)). However, just after the leak occurrence, the mass which flows 
towards the leak location increases and the mass flowing away from the leak 
location decreases in time until it reaches a new steady state (Fig. 21.2(b)). 

The above leak-resultant physical phenomena and their influence on mea- 
surements are the basis for analytical leak detection methods. 

The rate of the unintended mass loss, referred to as the leak size ql, is 
equal to the difference between the incoming and outgoing mass flow rates at 
the leak location. On the assumptions made, it can be calculated based on 
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Fig. 21.2. Pressure profile (a) and transient of the mass fiow 
(b) before and after the leak occurrence 



the inlet and outlet flow rate measurements: 

Ql ~ Qin ~~ Qex^ ( 21 - 1 ) 

where the superscript t represents the continuous-time index. Clearly, under 
leak-free conditions, the mass flow at both ends must be the same. 

The spot of the leak occurrence, called the leak location zl, can be ap- 
proximately determined by suitably estimating the coordinate of the crossing 
point for the two lines shown in Fig. 21.2(a): 



zl 



Zll-h 



tan 6i 
tan 6e 



-1 



( 21 . 2 ) 
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where the zero coordinate is the pipe inlet (Fig. 21.1). Clearly, in order to 
make this an estimate of the leak location zl^ it is necessary to assess the 
inclinations of the two pressure curves. 



21.2.4. Physical model of fluid flow in the pipe 

A physical description of the dynamics of the mass flow in long pipelines is 
founded on nonstationary equations of continuity and motion for fluids (Guy, 
1967; Marques and Morari, 1988; Wylie and Streeter, 1993). In particular, 
such a model of the flow can be expressed in terms of space- and time- 
dependent variables of the pressure p and the flow rate q. Consequently, by 
virtue of the principles of the conservation of mass and momentum, under 
isothermal conditions, this analytical model can be presented as 



A dp dq _ 

(? dt dz 



(21.3) 



1 dq ^ dp Xc^ q\q\ ffsina _ 

A dt^ dz 2DA'^ p c2 ^ 

which makes a nonlinear distributed-parameter system of partial differential 
equations of the hyperbolic type, describing the pressure and mass flow at 
every spatial point along a long pipe at any moment of time. For this problem 
to be well posed, one boundary condition defined for each end of the pipeleg 
is required. 

The above model is also widely used as a basis for achieving different 
pipeline-related goals such as modelling, designing, controlling and optimi- 
sation (see, for instance, Fasol and Pohl, 1990; Guy, 1967; Heath and Blunt, 
1969; Marques and Morari, 1988; Messey, 1989; Tijsseling, 1996; Wylie and 
Streeter, 1993). 

When developing methods for solving any distributed-parameter system 
for state estimation problems, we are faced with the necessity of approximat- 
ing the infinite-dimensional system by a finite dimensional system. For linear 
distributed-parameter systems, rigorous analytical approaches are available 
in the literature, and thus lumping can be postponed until the stage of the 
implementation of the solution. For nonlinear distributed-parameter systems, 
however, no general results are available, and approximations or early lump- 
ing have to be applied. 

With the nonlinear distributed-parameter model (21.3)-(21.4), several 
methods can be used to solve it numerically, such as the method of char- 
acteristics, Galerkin techniques and the finite difference approach (Marques 
and Morari, 1988). 




21. Detecting and locating leaks in transmission pipelines 



829 



21.3. Analytical leak detection and isolation 
methods for pipelines 

A fundamental concept behind analytical leak detection methods is to dis- 
tinguish the deviations of the process dynamics characteristic to the leak 
occurrence from the dynamics of the normal operating conditions. Therefore, 
the normal operating conditions have to be well determined in advance. One 
of the possible ways to achieve this is through the use of analytical/functional 
information (mathematical model) about the pipeline being monitored. Then 
we compare the outputs of the mathematical model with the corresponding 
observations for consistency. The resulting discrepancy, referred to as the 
residue {residual)^ can be further analysed in order to recognise the leak oc- 
currence and to estimate the leak parameters. It is thus a typical application 
of the general methodology known as analytical redundancy for fault detec- 
tion and isolation (Getler, 1998; Isermann, 1984; Patton et al, 2000; Willsky, 
1976), or as model-based fault diagnosis (see other Chapters of this book). 




Fig. 21.3. Leak detection and isolation by analytical redundancy 

The block diagram shown in Fig. 21.3 illustrates the fundamental func- 
tions of the analytical redundancy methodology used in a typical leak detec- 
tion scheme. 

The mathematical model of a process can, in general, be developed in two 
distinct ways. Analytical modelling consists in applying basic physical laws 
to the process components and utilising a known interconnection of these 
components. Synthetical or experimental modelling is based on a pre-selected 
mathematical relationship, which seems to explain the observed input-output 
data. In all cases, the choice of the model inputs and outputs is most essential. 
In the case of pipeline modelling, the input is naturally associated with pres- 
sure measurements and the output concerns mass flow rate measurements. In 
this study we apply analytical modelling in order to obtain the fluid dynamics 
both in leak-free pipes and in leaking ones based on the a priori information 
on the process and the measurements. To minimise the effects of modelling 
errors, adaptive mechanisms have to be applied that can be based on the 
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available measurement information. The residuals shown in Fig. 21.3 are de- 
rived from the difference between the analytically obtained model signals and 
their corresponding measurement signals. They are zero in ideal situations. 
In practice, however, this is seldom the case and their deviation from zero is a 
combined result of noise, faults and other technological factors. If the noise is 
negligible, the residues can be analysed directly. In the presence of significant 
noise, a statistical analysis is necessary. In either case, logical patterns {leak 
signatures'^) differentiating between various kinds of the leaking status {leak 
detection) and the leak-free ones are generated. The leak signatures are then 
analysed with the purpose of estimating the leak parameters {leak isolation). 

21.3.1. Volume balancing approach 

There is a simple leak detection approach, known as Improved Volume Bal- 
ancing (IB A), that makes use of a correlation technique applied to volume 
balancing (Gunawickrama, 2001; Siebert and Isermann, 1977). Referencing 
values describing the leak-free operational status of the pipe are there com- 
pared with their corresponding on-line measurements to generate discrepan- 
cies which characterise the leaking pipe dynamics. Immediately after the leak 
alarm appears, the reference signals are frozen, which permits a comparison 
of the pipe dynamics before and after the leak, and allows estimating the leak 
parameters (in terms of the size and location). 

A simulation-based verification (Gunawickrama, 2001) of performance- 
related aspects, important for a successful application of the IBA method 
to a specific pipeleg under known operating conditions, shows that this algo- 
rithm may not perform well and can lead to estimates of insufficient accuracy. 
Moreover, the reliability and robustness of IBA is quite questionable due to 
the evident weakness of describing the pipe dynamics during operational vari- 
ations by means of simple reference values. Due to large storage capacities 
and varying consumption, most (especially gas) pipelines rarely come to a 
steady state. It is rather unlikely that reference values determined via simple 
filtering would appropriately model such complex dynamics and lead to ac- 
curacies sufficient for leak detection and isolation objectives. It is thus clear 
that other leak detection techniques, which overcome the limitations of IBA, 
are necessary. 

A more promising solution is to apply an accurate mathematical model 
covering all the inherent dynamics (operational variations, continuous drifts 
of the operating point, static and dynamic disturbances, etc.) which influence 
the performance of leak detection schemes. 

21.3.2. Fault-sensitive and fault model approaches 

With the purpose of introducing more suitable algorithms, we can seek 
numerical solutions by means of two methods of lumping the nonlinear 

^ See Subsection 5.2.1, for instance. 
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distributed-parameter pipe model: the finite difference scheme and the 
method of characteristics. They lead to two different leak detection schemes. 
When using the first method, a discrete-time discrete-space pipe model is 
obtained via replacing both the time and space differential operators by suit- 
able centred implicit differences (discussed in Section 21.4). Alternatively, 
with the second method, a numerical solution to the hyperbolic system is 
achieved by applying the method of characteristics, which consists in finding 
functions on the (z,t) plane along which the partial differential equations 
reduce to exact ordinary differential equations. These equations can then be 
expressed in the finite difference form, which also leads to a lumped discrete- 
time discrete-space system (deliberated in on Section 21.5). 

Consequently, within the principal leak detection schemes considered 
here we shall name the following two analytical approaches, founded on dif- 
ferent ideas of modelling: 

• the fault-sensitive approach (using a leak-free pipe model), 

• the fault model approach (based on a leaking pipe model). 

In the first approach, the mathematical model describes solely the dy- 
namics of a leak-free pipeline (no leak is considered). Therefore, a systematic 
residual change develops if a leak occurs in the monitored pipeline. The leak 
size and location can then be estimated by suitably processing the gener- 
ated residues, which change in predetermined directions. Thus, taking into 
account the FDI framework, such a detection scheme can be classified as 
fault- sensitive. 

Alternatively, we can use a model covering the effect of leaks that leads 
to a fault model approach, in which information on leaks is mapped into 
‘residuals’ generated within the leaking pipe model. This makes a ‘direct’ 
estimation of leak parameters possible. 

Both methods require the pipeline to be described by an accurate math- 
ematical model. As there is never an exact agreement between the process 
and its model, the kind and size of model errors have to be well recognised 
so that techniques minimising the modelling errors can be made effective. 
Another general assumption is that in spite of the stochastic character of the 
process, the generated residues include clear symptoms of a leak in the form 
of a ‘significant’ change^, being large enough and lasting long enough to be 
detectable. We shall also assume the on-line availability of the pressure and 
flow rate measurements at the ends of the pipeleg. 



21.4. Fault-sensitive approach 

A leak detection method presented in this section accomplishes leak detection 
via leak-sensitive nonlinear state observation based on an advanced mathe- 
matical model (Billmann and Isermann, 1987). The method is founded on a 

^ See, for instance, the examples in Chapter 13. 
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nonlinear state observer, which determines the pipe’s state (dynamics) corre- 
sponding to a leak-free operational status (condition) . Due to the lack of leak 
representation within the estimated state and the resulting overall function- 
ality, this detection scheme is characterised as the Fault-Sensitive Approach 
(FSA). Its principal methodology is depicted in Fig. 21.4. 




Leak 

parameters 



Fig. 21.4. Leak detection in the fault-sensitive approach 



Pressure measurements at the pipe boundaries Pin and Pex are inputs 
to the state observer, which, in turn, outputs the current pipe state composed 
of the pressure and flow rate at predeflned discrete locations along the pipe. 
Inasmuch as the estimated state of the pipe contains the flow rates at the 
pipe’s inlet and outlet (Qm, Qex)’> we can compare them with their corre- 
sponding measurements {Qin^ Qex) so as to produce the residual signals. As 
has been described in Subsection 21.2.3, with the development of a leak the 
residues start shifting in predetermined directions away from zero. This makes 
the leak occurrence detectable and isolable via suitable techniques (even with 
the balancing IBA method, mentioned in Subsection 21.3.1). In practical sit- 
uations, most parameters of the pipe model are known with a good accuracy. 
The exception is the friction coefficient, which has to be estimated on-line via 
a reliable identiflcation method (Kowalczuk and Gunawickrama, 1998). This 
leads to a nonlinear adaptive observation scheme, which is able to compensate 
for the existing modelling errors. 
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21.4.1. Model of the transmission pipeleg 

The objective now is to model the dynamics of the pipeleg under consid- 
eration. This can be achieved by lumping the distributed-parameter sys- 
tem (21.3) and (21.4) with the use of a suitable finite difference scheme, 
which approximately lumps energy storage or dissipation into a finite num- 
ber of discrete time and spatial locations. 

Lumping the distributed-parameter system 

One possibility is to substitute for the differential operators in the distributed- 
parameter system the following centred difference scheme (Billmann, 1982): 



dx 








(21.5) 


dt 


d,k 




<1 


dx 




rrk 


_ _L rpk-l _ k-1 

•^d-1 ^ -^d+1 -^d-l 


(21.6) 


dz 


d,k 




4Az 



where A; is a discrete-time index, denotes a time interval {t = kAt), d 
means a discrete-length index, and Az stands for a length interval {z = dAz). 

Discrete-time pipeleg model 

Let us consider a pipeleg of length Z hypothetically divided into sections as 
shown in Fig. 21.5. 



^0 ^in h ^ ^3 Pn-3 ^ ^N-lPf.r= ^ex 











I 








«2 


“4 


77 


^N-4 


%-2 


7 ; 

II 



A = Z/^ 



Fig. 21.5. Subdividing a pipeleg into N sections 



The above allows us to describe the distance variable z = dZ /N = dAz, 
where AT is a natural (positive integer) even number. Consequently, we mea- 
sure the following pressure and mass flow rate quantities at the boundaries 
(at the locations d = 0 and d = N): 

^ pk — pk 

Pq ~ Pn ■‘ex 5 
QO = Qin^ Qn = QL- 
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State variable representation 

The dynamics of the pipeleg at specific locations defined by d and at a 
current discrete-time k can thus be written in the form of the following state 
variable representation: 

A®* = (21.7) 



^ j. 

I Pi Ps P5 ■■■ pn-1 € 

I 

has the dimension (A^ -f- 1), the input vector is defined as 

= PNY=[Pin 

and the matrices A-^E are certain functions of At, Az and other parame- 
ters {c,g,D,a) of the pipe transmission process (Gunawickrama, 2001), and, 
additionally, the matrix C depends both on the fiuid friction coefficient A 
and on the internal states and of particular sections. 

If all pipe parameters are known, and the last two states and pressure 
measurements at both pipeleg ends are available, then the current state can 
be computed on-line with the aid of the following nonlinear equation: 

+ Ev!^^ . 

Note that the system matrix A is constant for a given pipeleg (Gunawick- 
rama, 2001). Therefore, it is sufficient to verify the nonsingularity of A once, 
before implementing the system. 

21.4.2. Residue generation via nonlinear state observation 

As can be seen from (21.7), the pipe state includes the flow rates and 
which are also available from the on-line measurements and 
As the estimated state gives us a measure of the leak-free operation 
of the pipeleg, the difference between the estimated and measured flow rates 
becomes our diagnostic residue. Consequently, the resulting residue generator 
based on nonlinear state observation can be described as follows: 

Nonlinear state observer 





1 0 ... 0 I 0 ... 0 

0 0 ... 1 I 0 ... 0 



(21.9) 
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Diagnostic residue 



<1 




’ Ql 1 


1 

t> 




1 

o 



( 21 . 10 ) 



where the sign ^ placed over a symbol indicates an estimate of the respective 
variable. 



21.4.3. Leak parameters 

If the above algorithm makes a proper evaluation of the residual vector e^, 
under the leak-free operation of the pipe we have 

= [ ^^0 
Aq% 




which allows applying the same elementary techniques of on-line leak param- 
eter estimation as those used in the balancing method IBA (Gunawickrama, 
2001; Siebert and Isermann, 1977). 

Alarm generation 

A primary detective task is a precise differentiation between the leak occur- 
rence and another possible operational status. An aptly sensitive algorithm 
for alarm generation can be based (Gunawickrama, 2001; Siebert and Iser- 
mann, 1977) on the cross-correlation of the mass flow rate residues 
Ag^, taken with the time shifts r, and computed recursively with the use of 
fading memory, implemented by means of a weighting factor®: 0 < ry < 1: 

+ (1 - ^) (Ago"'' Aq%). (21.11) 

Additionally, the results are summed up for a restricted number of time shifts 
T (1,2,..., Tmax)* 

1"max 

$l=E^oV(^), (21.12) 

r=l 

leading to the leak alarm if, for a certain time instant k , falls below the 
chosen threshold 



® Larger values of rj cause improved smoothing (noise reduction) and delayed detection. 
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Leak location 



For a pipeline with a constant height profile, in an asymptotic steady state, 
when dp/dt 0 and dq/dt 0, the flow model given by (21.3) and (21.4) 
can be reduced to 

1 = 0 . ( 21 . 13 ) 



dp _ \c^ q\q\ 



(21.14) 



Assuming that the parameters A, c and D are constant for the analysed 
pipeline, it results from (21.14) that the ratio 




(21.15) 



is proportional to the angle of a linear pressure drop (see Fig. 21.2(a)) along 
the pipe distance. On the basis of plain reasoning we conclude that also the 
change of inclination observed in the pressure curves of the leaking pipeline 
(Fig. 21.2(a)) can be described by a similar relationship, which is quadratic 
in terms of the directionally positive mass flow. 

Consequently, for a given time instant the above relation can be assessed 
at the pipe inlet and outlet with the use of the estimated residues (21.10): 

tane\n oc (Aq^Aq^) , (21.16) 

tanel^ oc (Aq%Aq^) . (21.17) 



By introducing an appropriate filtration of the above measurement- 
related quantities, we can suitably compute the location of the leak according 
to (21.2) as 



zt = Z 



1 



= Zil + 






(21.18) 



where we utilise the corresponding limited sums {aggregate auto- correlations): 



(21.19) 

r 

Wj (21.20) 

r 

whose component terms are auto-correlation functions computed recursively 
with the use of a weighting factor 0 < p < 1: 

^S,o(r) = M^S,o'(r) + (1 - m) A^"), 



+ (1 - a *) (Ag^ Aq^). 
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Leak size 

A useful estimate of the size of the leak can be obtained from a simple dy- 
namic balance equation (see Figs. 21.2(b) and 21.3, as well as the equa- 
tion (21.10)): 

( 21 . 21 ) 

where E{-} denotes the operator of the expected value of the analysed er- 
godic process. 

21.4.4. On-line estimation of the friction coefficient 

In practice, all pipe parameters but the friction coefficient A in (21.7) are 
known with acceptable accuracy. Practically, the friction coefficient that mea- 
sures the resistance made by the fluid being pushed through the pipe is 
immeasurable and depends on various process parameters such as the fluid 
density, viscosity, average velocity, temperature, wall characteristics and di- 
ameter of the pipe, etc. (Czetwertynski and Utrysko, 1968). Therefore, A is 
a time- and space-dependent parameter, because it may vary in time and 
can have different values in different locations along the pipeleg. Thus the 
friction coefficient needs to be estimated via a suited parameter identification 
scheme. 

Estimation of multiple friction coefficients 

While considering the time- and space-dependent nature of A, we find out 
that the state space representation (21.7), modelling the dynamics at N dis- 
crete points along the pipeleg (see Fig. 21.5), contains only N/2 1 dif- 

ferent friction coefficients (Aq, A§, . . . , A^). They are associated with cer- 
tain elements of the matrix C, and concern the selected points {d = 
0, 2, 4, . . . , AT) along the pipeline and their corresponding momentum balance 
equations (21.4). 

We begin the development of the estimation scheme with the extraction 
of the friction coefficients, entailed in the model (21.7)-(21.8), out of the term 

= Caa:*-i + Co(»*-i)A*. 

It thus appears that C(a;*“^,A*) is an affine function of A* = 
[ Aq a* ... A^ ], which leads to 

a:* = A~^ [Bx'^-'^ + + Eu'^] + (21.22) 



which can be transformed into the form of a linear static vector equation: 

a:* =x* + M*A*, (21.23) 
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where 



= A~^ + Cax’‘~^ + Dv!^-^ + , 

M* = [A~^Cc] e 

The equation (21.23) has the form of a linear regression, which allows us 
to identify the vector by applying one of the known parameter estimation 
schemes such as recursive or nonrecursive least squares. 

The estimation of multiple friction coefficients calls for, however, mul- 
tidimensional or parallel estimators, which requires a great computational 
effort. Due to a small number of available measurements (as compared to the 
number of parameters needed to be estimated), the results of identification 
may have a high degree of undesirable uncertainty. Even with various care- 
fully driven multidimensional schemes (Kowalczuk and Gunawickrama, 1998) 
for identifying A^, the risk that the resulting updated state space observation 
process will become unstable cannot be fully eliminated. 



Generalised friction coefficient 

In order to obtain an effective and reliable state estimation scheme, let us 
introduce the following procedure, which considerably decreases the system’s 
structural complexity (and computational effort) and increases our control 
over the identification algorithm. 

Firstly, we assume that the net effect of fluid resistance during trans- 
portation is modelled by a single time variant parameter A = A^, i.e., its 
spatial dependency is neglected and only its time-dependent nature is taken 
into account. The vector of the friction coefficients then becomes 



A^ 




A* A* ... A* ] . 



(21.24) 



This assumption appears reasonable, as the minimisation of modelling errors 
in state variable representation is more important than an accurate identifi- 
cation of the friction coefficients along the pipeline. 

From the vector matrix equation (21.23) we can derive two indepen- 
dent static equations for the flow rates at the inlet and outlet points (state 
variables) of the pipeleg: 



iV/2+l 

g5(A) = x''[l] = 5'=[l] + A'= M'=[l,i] 

i 



N/2+1 

q^{\) = x'^[N/2 + 1] = x’’[N/2 + 1] + A* Y, + 1, i]. 
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By comparing the above quantities with their respective measurements we 
create error functions: 



eg(A'') = {Qfn-gS(A^)}^ 

4(A'=) = {QL-9^(A*)}'. 

Minimising the error functions with respect to A = , 



(A) ^ „ 

d\ 



and 



^e^(A) ^ „ 

dX 



(21.25) 

(21.26) 



leads to two independent nonrecursive (static) estimators of the friction 
coefficient: 



0 N/2+1 ’ 

E M%i] 

i 


(21.27) 


u QL-®1iV/2 + l] 

- N/2+1 


(21.28) 



E M'=[AT/2 + l,i] 



The two estimates of the friction coefficient can be combined (for in- 
stance, by averaging) into a unique value, which generalises the effect of fluid 
resistance along the entire pipeleg. Practically, it is recommended to com- 
pute the applicable value of the generalised friction coefficient A = A^ via 
recursive averaging with fading memory, so as to include the dynamics (or 
rational inertia based on the history) of the process: 

V=a'=-1 + (1-?)^^^, (21.29) 

where <;• denotes a suitably chosen weighting factor. 

Another advantage of this approach is that the estimated friction coef- 
ficient does not change the steady-state solution of the mass balance (21.3). 
Thus, with the assumption that immediately after triggering the leak alarm 
the friction coefficient is frozen (its estimation is stopped) , the observer should 
not compensate for the leak effects. 

21.4.5. Exemplary monitoring with the use of the FSA-LDS 

The dynamics of a pipeleg that corresponds to various operating conditions 
were simulated by means of a pipe simulator'^ (Gunawickrama, 2001), the 



7 



Or an emulator, as it emulates in real-time the model reaction of a suitably described 
pipeleg. 
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main purpose of which is to generate the emulated pressure and flow rates at 
the boundaries of the examined pipeleg and that are 

input to the LDS system as the measured quantities. 

Let us consider a noisy process of the gas flow through a pipepleg tech- 
nically described by the parameters listed in Table 21.2. Under the initial 
line pressure drop of 20 [bar] from the inlet to the outlet, the gas was moved 
through at an average rate of 120 [kg/s]. The level of the measurement (white) 
noise was assessed as three standard deviations. The sensor resolution (pre- 
cision) was assumed to be known precisely. The measurement-related char- 
acteristics of the monitoring process are given in Table 21.3. 



Table 21.2. Parameters of the monitored pipeleg 



Parameter 


Symbol 


Value 


Unit 


Length 


Z 


40.0 


[km] 


Diameter 


b 


639 


[cm] 


Velocity of sound® 


c 


304.23 


[m/s] 


Friction coefficient 


A 


0.0345 


— 


Angle of inclination 


a 


00.00 


[rad] 



Table 21.3. Process measurement characteristics 



Parameter 


Symbol 


Approximate Value 


Unit 


Initial pressure at the inlet 


Pin 


80.35 


[bax] 


Initial pressure at the outlet 


Pex 


60.45 


[bar] 


Average flow rate 


Q 


iQin+Qex)/2 =120 


[kg/s] 


Pressure noise level 




0.01 


[bar] 


Pressure resolution 




0.001 


[bar] 


Flow rate noise level 




0.1 


[kg/s] 


Flow rate resolution 




0.01 


[kg/s] 


Sampling interval 


Ts 


10 


N 



Recall that the FSA algorithm consists of three principal modules co- 
operating in terms of the observation of the pipe state, the estimation of the 
generalised friction coefficient, and the estimation of the leak parameters. A 
proper setting of the FSA parameters is crucial for a successful leak detection. 



8 



A specific value of c determines whether the fiuid is a gas or a liquid. 
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A set of parameters suitable for the analysed case is listed in Table 21.4, with 
some of the algorithmic constants found on the trial-and-error basis. 



Table 21.4. Parameters of the fault-sensitive algorithm applied 



Parameter 


Symbol 


Value 


Unit 


State observer 


Pipe length 


Z 


40.0 


[km] 


Pipe diameter 


D 


639 


[cm] 


Velocity of sound 


c 


304.23 


m 


Friction coefficient 


A 


0.0345 


m 


Angle of inclination 


a 


00.00 


[rad] 


Number of sections 


N 


20 


— 


Time interval (= Ts) 


At 


10 


[s] 


Friction coefficient estimator 


Estimation status 




Off (A is not updated on-line) 


— 


Initial value 


A“ 


0.0345 


— 


Weighting factor 




0.9 


— 


Leak parameter estimator 


Alarm threshold 




-1.00 


— 


Weighting of 




0.7 


— 


Weighting of $oo and ^nn 




0.9 


— 


Memory length of 


rmax 


25 


— 



Note that the parameters of the state observer modelling the pipeleg are 
Z, D, p, c, A, and a. As is shown in Table 21.2, their values were set as 
equal to the corresponding parameters of the real pipeleg Z, D, p, c, A and 
d. This means that the parameters of the pipeleg were known accurately and 
no modelling errors did exist in this respect. As the friction coefficient A was 
assumed to be constant, its estimation was switched off. 

Two most important observer parameters are the time interval (or the 
FSA iteration period) At and the number of pipe sections N; they are key 
parameters in the process of discretising the pipe model in time and space. 
The values of At and N must be chosen carefully, as inappropriate values 
can completely mask the real dynamics of the pipeline. The iteration interval 
At should be assigned a value close (if not equal) to the sampling interval Ts 
of measurements. Then, with a suitable choice of N, a well-stabilised observer 
can be obtained that matches the inherent pipe dynamics with acceptable 
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accuracy. In the case considered, the number of pipe sections of the model 
was set to twenty {N = 20), and the measurements were supplied to the 
FSA-LDS system each 10 [s] {At = Ts). 

An important characteristic of the gas flow is the drift of the operat- 
ing point. As a matter of fact, the gas pipe dynamics continuously drift and 
never really come to a stationary steady state. In order to better validate the 
FSA-LDS system, we consider its functionality under varying kinds of oper- 
ational status (Kowalczuk and Gunawickrama, 2001). The first technological 
variation of the operating point was simulated as an input pressure increment 
of 4 [bar] at the time moment topi = 15 [min]. To test the system’s leak 
detection capability, a leak described by the parameters given in Table 21.5 
was originated at = 75 [min]. After toP 2 = 175 [min] of the test, another 
variation of the operating point via an input pressure decrement of 2 [bar] 
was generated so as to examine its influence on the system estimates. The 
obtained results are depicted in Fig. 21.6. 



Table 21.5. Parameters of the analysed leak 



Parameter 


Symbol 


Value 


Unit 


Size 


QL 


3.6 


[kg/s] 


Location 


ZL 


20 


[km] 


Rising time 


Tlr 


5 


[min] 


Moment of the occurrence 


tL 


75 


[min] 



The first operational variation at topi (Fig. 21.6(a)) produced an in- 
crement in the average flow rate by 15 [kg/s] shown in Fig. 21.6(c). The 
corresponding reaction of the pipe outlet is clearly minor (Fig. 21.6(b), (d)). 
The disturbing transient of the pipe operating point was almost perfectly 
ignored by the observer keeping both the residuals A^o and Aqjsi near zero. 
That is, despite the shift in the operating conditions, the observer continues 
to describe the leak-free pipe dynamics well and, thus, appears to be suitable 
for leak detection. 

The effect of the rarefaction wave, resulting from the leak occurrence, on 
the measurements can be easily seen from the measured flow rate trajectories 
(Figs. 21.6(c), (d)). As the leak was located exactly in the middle of the 
monitored pipeleg, the changes in the flow rate magnitudes (with respect to 
the leak-free dynamics) are approximately equal. The pressure values at the 
pipe boundaries decreased marginally (Figs. 21.6(a), (b)). 

Robustness to varying operating conditions was also confirmed by a suc- 
cessful detection and isolation of the leak shown in Figs. 21.6(e), (f) and (g). 
After the second operational variation at toP 2 ? the pipe observer again well 
adapts to the new operational status - without loosing the information on the 
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Fig. 21.6. Leak detection via FSA under varying operating conditions with the 
generic time moments: to pi ~ first variation, tL - leak, toP2 - second variation 



leak (notice the residues Aqo and Aq^ after top2)- After some small over- 
shoot (caused by inaccurate estimates of qo and qN during a short transient 
period), the leak estimates soon stabilise around their proper values. 

In conclusion we recollect that the analysed fault-sensitive approach for 
leak detection and isolation can be based on an adaptive nonlinear pipe ob- 
servation and a cross-correlation technique applied to leak parameter depen- 
dent residues. Once the corresponding parameters are properly set, the ob- 
server output describes the leak-free pipe dynamics with acceptable tolerance 
(residuals are approximately zero). In practice, this may be difficult due to 
inaccurate knowledge of the pipe parameters. Therefore, a reliable adaptive 
mechanism based on the identification of the immeasurable generalised fric- 
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tion coefficient is introduced so as to effectively compensate for unavoidable 
modelling errors. The leak occurrence is detected when the cross-correlation 
sum of the residues exceeds a certain threshold, whose value is highly re- 
lated to the system’s sensitivity. If the observer identifies the pipe dynamics 
well, the accuracy of the leak parameter estimates mostly depends on the 
stochastic nature of the process (related to measurement variations, noise 
effects, dynamic and static disturbances, etc.). Clearly, the FSA-LDS system 
is capable of differentiating between the leaking pipe dynamics and other op- 
erational variations (reliability, no false alarms) , and can continue to function 
properly despite the varying conditions of the pipe operation (robustness). 



21.5. Fault model approach 

In this section, we shall describe a leak detection scheme based on the work of 
Benkherouf and Allidina (1988), which can be characterised as the nonlinear 
state observation of structurally modelled leaks and can thus be categorised 
as the fault-model approach to FDI (LDS). This type of modelling makes 
a conceptual difference between these methods and those presented previ- 
ously (the IB A and the FSA). On the assumption that leakages do exist 
continuously at a number of known pipe locations {structurally modelling), 
the mathematical model applied is capable of assessing the pipe dynamics 
under the leak-free and leaking operating conditions. The expected true leak 
parameters are obtained by means of superposition applied to the estimated 
pipe state quantities describing the mass flow of hypothetical structurally 
modelled leaks. 

The main functional blocks of an LDS system founded on the FMA are 
depicted in Fig. 21.7. 

A nonlinear state observer, based on a discrete-time discrete-space ana- 
lytical model of pipe dynamics including the leak-free and leaking operational 
statuses, is the input {u^) with the pressure and flow rate measured on-line 
at the inlet and outlet, respectively. The observer output equivalent to the 
current pipe state consists of the pressure, flow rate and structurally 
modelled leak flow rate values at predefined pipeline locations. During the 
leak-free operation, the structurally modelled leak flow rates stay near^ zero. 
In the case of a true leak in the analysed pipe, the structurally modelled 
leak flow rates start shifting away from zero and the superposition technique 
is sufficient for obtaining the desired leak parameters. The measured pipe 
output marked in Fig. 21.7 is used to minimise the pipe state estima- 
tion error via the extended Kalman Altering applied to a linearised model 
around a current (monitored on-line) stationary state x (Benkherouf and 

Allidina, 1988; Kowalczuk and Gunawickrama, 2000). An updated state x 
of the pipe is then fed back to the observer to give initial conditions for the 

® Ideally, they should be equal to zero. 
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Input Output 




Fig. 21.7. Leak detection in the fault model approach 



next time instant. This idea can be interpreted as an adaptive mechanism for 
minimising overall modelling errors. 



21.5.1. Mathematical model of the pipeleg 

In the following, we shall derive a mathematical description that can accu- 
rately model a given pipeleg under the anticipated working conditions. We 
shall consider the distributed parameter system (21.3)-(21.4) of the fluid 
flow through a leak-free pipe together with a complementary static equation 
modelling a leak of assumed parameters (size and location) as a simplifled 
description of the leaking pipe. A numerical solution to the distributed pa- 
rameter system will be obtained via the method of characteristics, which, 
in essence, consists in using such functions on the (z,t) plane along which 
the partial differential equations mentioned above reduce to exact ordinary 
differential equations. These will then be expressed in the finite difference 
form, which in effect leads to a lumped nonlinear discrete-space discrete-time 
system, modelling the dynamics of a given pipeleg under both the leak-free 
and leaking conditions. 



Description of the leaking pipe 

Consider the distributed parameter system (21.3) and (21.4) written for a 
pipeline with a constant height profile: 



^ _n 

dt Adz ’ 



(21.30) 



dq ,dp Ac^ g|g| _ n 

dt dz 2D A p 



(21.31) 
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If a leak develops at a pipe distance zl, the above equations are valid 
only for z ^ zl- Thus in the nearest neighbourhood of zl, i.e., ^ 

discontinuity in the mass flow rate occurs (see Fig. 21.8 presented below). 
The conservation of mass at zl ^ [z^ , zj^] requires that 

= ql (21.32) 

With this we assume that the leak introduces a negligible momentum in the 
2 ; direction, so that the equation (21.31) is unaffected for z = zl- 



Method of characteristic lines 

A manageable system of equations can be obtained by finding linear func- 
tions on the (z,t) plane, where partially differential equations are reduced 
to ordinary differential equations. Such a technique is known as the method 
of characteristic lines. 

By multiplying (21.31) with a real nonzero r. 



r^+rA^ +r^M=o 
dt ^ ^dz ^ 2DA p 

the equations (21.30) and (21.31) can be linearly combined as 



^dt dz) \ rA dz ) 2D A p 



When the value of r is chosen such that 

dz A t ^ A , 

,.e., r = ±- a»d ^ = ±c. 



(21.33) 



(21.34) 



(21.35) 



the parenthetical terms in (21.34) become the time derivatives of the pressure 
and flow rate: 



\dt dz) \dt dzdt)~dt' 

/ dq (? _ fdq dq _ dg 

\dt ^ rAdz ) \dt ^ dz dt ) ~ dt' 

Consequently, by utilising the above in (21.34), the following equations for 
±r are obtained: 



dp 


cdq 


Ac^ q\q\ . 


dt 


Adt 


2£»A2 p 


dp 


c dq 


c 

II 

CO 


¥ “ 




2DA^ p 



for r = 




for r 



c 

T 



(21.36) 

(21.37) 
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Thus the relationship given by (21.35) defines the characteristic func- 
tions on the (z, t) plane along which the partial differential equation system 
(21.30)-(21.31) becomes the ordinary differential equation system (21.36)- 
(21.37). As the isothermal conditions are assumed, the velocity of sound c is 
constant and the characteristics are linear functions with the slopes -he and 
— c. In physical terms, these lines represent the propagation of forward and 
reflected waves with the velocities -he and — c, respectively. The solution at 
a given point (z,i) is the superposition of such waves reaching the point z 
at the time i. 

In order to discretely integrate the system of equations (21.36)-(21.37), 
we divide the {z,t) plane into equal meshes of the pipe section length Az 
and the time span At as shown in Fig. 21.8. 



An 



A 






/ ^ \ 



k-l 






4 -.. 

t [?2] 

hill— 

''pu 



d-l 



PI 



\ 

k \ 









fP3 



Az 



d 






d + \ 



k-\ 



Fig. 21.8. Discretisation in the discrete space d and the discrete time k 



With an assumed number M, the analysed pipeleg of the length Z 
contains (M — 1) hypothetical pipe sections. Hence Az = Z /{M — 1), which 
yields such a choice of the time mesh At = Az/c that the grid points lie 
on the characteristic lines. Thus each grid point can be described by the 
discrete space d and the discrete time k, defined as z = {d — l)Az for 

d = 1, 2, 3, . . . , M, and t = kAt for A: = 0, 1, 2, The effect of the above 

discretisation is illustrated in Fig. 21.8, where the grid points PI = (d, fc), 
P2 = (d — l,fc — 1), P3 = (d + l,fc — 1) and their associated variables 
are distinguished. The points P2 and P3 are shifted in space by 2Az and 
are concurrent in time, while PI is equidistant in space from both P2 and 
P3, and advanced in time by At. By the spacing, the edges from P2 to 
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PI and from P3 to PI are the positive and negative characteristic lines, 
respectively, labelled with their respective slopes +c and — c. 

At each grid point, we assume a structurally modelled hypothetical leak 
of the parameters: 

qLd = qd--Qd^, (21-38a) 

ZLd = {d- 1)A2, (21.38b) 

where and denote the incoming and outgoing mass flow rates 
through the pipe at the location d, respectively. 



Discrete model of the pipeleg 

The necessary discrete integration of (21.36)-(21.37) carried out at the point 
(d, k) according to the scheme of Fig. 21.8 results in, respectively. 



{Pd-Pd-l) + j{qd- 






+ 



( qa-\q^ 



ADA^ 



P\ 



+ 







9d-i+ 


9d-i+ 



Pd-l 





(21.40) 



The above system of equations can be shown (Benkherouf and Allidina, 1988; 
Gunawickrama, 2001) in the compact form of a nonlinear system: 

/(a:^a;''-^■u^u''-l) =0, (21.41) 



which consists of 2M — 2 nonlinear equations, containing the input vector 



u 






Pi 



qh 



and the state vector 






nT 



P2 Pa ■■■Pm I 9 i «2 



>qM-i I 9l2 qla ■■■ ql(M-i) 



cp3M-4 



where, for notational convenience, the ‘minus’ sign indicating the outgoing 
mass flow rate and used in subscripts is neglected, i.e., =■ q ^_ . 



Numerical solution to state variable representation 

The preceding nonlinear state variable representation (21.41) cannot be 
solved explicitly. Still, based on this full-blown nonlinear model, an itera- 
tive numerical method can be used to estimate the state vector with 
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assumed accuracy, at a given time moment k. One of the possibilities, refer- 
ring to Newton’s discrete method of solving nonlinear systems, is expressed 
in the following procedure (Kowalczuk and Gunawickrama, 2000): 

Procedure 21.1. (Finding an equilibrium point of the pipe flow) 

0° Set the two initial values and for x^~^ (i.e., x^ = x^ = 
and let i define the current iteration index (initially i = 1). 

Calculate recursively, for i = 1,2,3,..., 

=x^ -J-^{x^)f{x\x^-\u^,u^-^), (21.42) 

where J ~^{x^) is the left pseudo-inverse^^ of the discrete Jacobian 
J{x^) G i^(2^-2)x(3M-4) j ^ approximately evaluated as 

J{x^) = J[m,n] « + ^nAx) - fm{x^) 

iXXn 

for m — 1,2,..., 2M — 2; n = 1,2,..., 3M — 4; and fm is the m- 
th element of the vector function f; Axn denotes the n-th element of 
any vector Ax G of sufficiently small nonzero elements; and 

In G m(3-^-4)x(3M-4) identity matrix (In[n,n] = 1^). 

2^ Terminate the calculations if 

11®^+^ -5^11 < C, (21.43) 

where || || indicates the infinity vector norm^^ and ( denotes the accu- 
racy of the solution. Then the approximate solution of the state space 
system ( 21 . 41 ) at the time k is x^ = x^~^^ . 

Thus, by knowing the previous state x^~^ and input as well as 

the current input available from measurements, the current pipe state 
x^ can be predicted on-line. 

Note that prior to any application of iterative numerical methods for 
solving a set of nonlinear equations, it is necessary to perform a test for the 
convergence of the resulting series (21.42). The best-established techniques 
can be found in the common literature. 

21.5.2. Determining the leak size and location 

As the state vector x^ contains the structurally modelled leak flow rates 
at predeflned locations, it is possible to estimate the real leak parameters 
(Benkherouf and Allidina, 1988) on the basis of the mass and momentum 
conservation principles. 

See Section 6.5 in Chapter 6. 

Which is simply a maximal element of the vector under the norm. 
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Similarly as before, (21.13)-(21.14), in a steady state, when dp/dt 0 
and dq/dt 0, the partial differential equations of (21.30)-(21.31) describ- 
ing a leak-free pipeline become 



and from (21.44) it results 




^ g kl Q 

dz 2DA^ p 

that 

qI = const = q, 



(21.44) 

(21.45) 



(21.46) 



which means that the flow rate in a steady state, call it q, is independent of 
both 2 ; and t. This (positive) value can be determined, for instance, from 
the boundary condition at z — Z\ q = qz. 

Substituting the obtained q into (21.45) and rearranging give 




Ac2 

DA^ 



By integrating the above, we obtain the following quadratic pressure drop 
equation: 

{Pz?-{Pof = -^tz, (21.47) 

where pz denotes the steady-state pressure at the distance 2 : from the pipe 
inlet. 

To derive the relationship (see Benkherouf and Allidina, 1988) between 
the real leakage and the many modelled ones, let us consider two identical 
pipe segments with the same boundary conditions po and qz as shown in 
Fig. 21.9. We assume one leak with the parameters (qL^^L) in Model I and 
two leaks (^li>^li) and (^l 2 ?^L 2 ) in Model II. Our objective is to find qi 
and zl such that the steady state conditions in Model I are the same as in 
Model II. 

By comparing the loss of mass (conservation of mass) in both models we 
see that 

q + qL = q + qLi+qi 2 , (2i.48a) 

qL = qLi + qL2- (2i.48b) 

The application of (21.47) as the quadratic pressure drop (conservation of 
momentum) to Model I gives 



{pz? - iPo? = {(Pz,? - iPo?} + {iPz? - iPzJ^} 



Ac2 _ 



l + % 



q 



2 

Zl + {Z — Zl) 



(21.49) 
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Fig. 21.9. Steady-state dynamics in two identical pipe 
models with one leak (I) and with two leaks (II) 



while its utilisation to the three leak-free segments in Model II leads to 

ipzf - ipo)^ = - (po)^} + {(Pzr,2f ~ (PzLif} 

+ {(Pzf - {PZL2?} 



-! (^ , +9L2 ', 

M (1+ = I ZLl 



DA^ 



+ 11 + ^) {zl2-zli)a{Z-zl2)\- (21.50) 



'}■ 



As the momentum losses in both models are to be identical, it results from 
(21.49)-(21.50) that 

Q \Q J q \ q J 

. ( qL2\^ . rtqLiqL2 

+ 1-^1 Zl2 + ^- -2 ^Ll- (21.51) 

Now, by virtue of the assumption that the leakage is usually very small as 
compared to the main steady-state flow q, the following relationships become 
true: 









qLlZLl + QL 2 ZL 2 ^ ( gZ,l 

q V 9 



qilZLl + qL2ZL2 ^ V g£l^Z,l + gL2.gZ,2 



(21.52) 



q ) ’ 



» 
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Thus the second order terms can be neglected in (21.51), leading to the 
balancing expression 

QLZL » qilZLl + qL 2 ZL 2 - (21.53) 

By induction, the equations (21.48b) and (21.53) can be generalised to a 
greater number of leaks (in Model II). That is, the leak {qL 2 ^ZL 2 ) can itself 
be decomposed into two leaks (^lsj^ls) and (g'L 4 ,^L 4 ) such that ql 2 — 
QL3 + Ql4 and ql2Zl2 = QL3ZL3 + QlaZla, and thus ql = Qli + Qls + Ql 4 
and qlZl = QliZli 4- QlsZls + QlaZla- Next, the leak (q'l 4 ,^L 4 ) can be 
decomposed into two other equivalent leaks, and so on. 

In such a way the generalised forms of both relationships (21.48b) 
and (21.53) for n leaks can be readily given as 

= 9L1 + ^L2 + • • • + qLn, (21.54) 

qLZL = qhlZLl + qi2ZL2 + • • • + qLnZLn- (21.55) 

On the basis of the structurally modelled leak flow rates within the state 
vector and the analogy resulting from (21.54)-(21.55) we can directly 
estimate the true leak parameters. 

Leak size 

The knowledge of the structurally modelled leak flow rates 

• • • 5 ^l(m-i) (frc>m the estimated state x^) allows us to com- 

pute the current leak size at the discrete time k via the following: 



M-l 



Qii 



i=2 



(21.56) 



Leak location 

A suitable estimate of the leak location can be found from (21.55) as 

. M-l 

(21.57) 

i=2 

Note that, in practice, the leak parameters should be low-pass Altered (via 
recursive averaging with fading memory, for instance). 

21.5.3. Minimisation of modelling errors via the extended 
Kalman filtering 

In order to minimise the estimation error of the pipe state x^ and the result- 
ing overall modelling errors, the method of the Extended Kalman Filtering 
(EKF) can be applied, which is matched to a system model linearised around 
a stationary system state x. 
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Model linearisation 

Suppose that a nominal solution (the current steady state) to the nonlinear 
state variable model (21.41) is known and given by {x,x,u,ii). The dif- 
ference between these nominal vectors and some slightly perturbed vectors 
{x^ ,x ^~^ can be defined by 



dx^ = x^ — X, dx^ ^ = x^ ^ — X, 
du^ =u^ —u, du^~^ = — u. 



(21.58) 



By linearising the system (21.41) with du = 0 and using the Taylor se- 
ries expansion around a settled x^ we find that the equilibrium point (steady 
state) is locally governed by 

dx^ = Adx^-^ + w^-\ (21.59) 

where dx^ constitutes a state vector of the linear process (21.59) with a state 
transition matrix A, and is a vector sample of a process noise sequence. 
The observation process is given by 

dy^=Hdx^+v^, (21.60) 

where represents an observation noise process and H is an observation 
matrix, which maps the measurable quantities of dx^ into dy^. 



Extended Kalman filter 

Let us assume that the disturbances and are (mutually and inter- 
nally) uncorrelated zero-mean Gaussian-noise sequences with known matrix 
covariances S and R: 

E{w^} = 0, E{w^{w^)'^} = SS{k - /), E{v^} = 0, 

E{v^{v^)^} = RS{k - 0 , E{w^{v^f} = 0, E{dx^{w^)'^} = 0, 

E{dx^{v^)'^} = 0 , 

where the Kronecker delta S{k — 1) equals 1 if k = and is zero otherwise. 

If the current pipe state x^ is predicted by solving^ ^ the nonlinear sys- 
tem (21.41) and some additional^^ pressure or fiow rate measurements y^ 
along the pipe are available, then the small-signal vectors dx^ — x^ — x and 
Qyk ^ yk _ y used to generate the error between the observations 

and their estimates = dy^ — Hdx^. Next, in order to obtain a better^^ 

By using Procedure 21.1, for instance. 

With respect to the structure of the original pipe model. 

In terms of minimising the error e^. 
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estimation of the extended Kalman filter can be applied utilising the 
following quantities: 

z= an a priori predicted state vector based on past data, 

up to and including the previous time instant A: — 1, 

an a posteriori estimated state vector based on past 
data, up to and including the current time instant k, 

dx^ — x^ — X an estimated (corrective small signal) state vector of 

the Kalman filter, 

dy^ —y^—y an measured system-output (small signal) vector, 

= E{dx^{dx^y-] an a priori state covariance matrix, 
k k 

V = E{dx {dx )^} an a posteriori state covariance matrix. 

A complete FMA leak detection algorithm (Benkherouf and Allidina, 1988) 
based on the EKF can be summarised in the form of Procedure 21.2. 

Procedure 21.2. (Executing the fault model leak detection procedure of the 
FMA-LDS) 

_ _o _ 

0^ Initiate x, V ^ R, S, x = aj, = 0 and k = 1. 

1° Measure the pipe input and output y ^ . 

2® Evaluate the state vector x^ by solving the nonlinear system 

f{x^,x = 0 and determine the small-signal variables dx^ 

and dy^. 

3^ If a new stationary state x is detected^ provide the corresponding modi- 
fications to A. 

k 

4^ Apply the Kalman filter to obtain a better a posteriori estimate x based 
on the latest measurement y^ : 

V'° = AV^A^ + S, 

V*" = (/ - dx° = dx’’ + L\dy’^ - Hdx% 

^k _ 

X — X + OX . 

5^ Estimate the leak size and position at the current time k: 

M-l . M-l 

i=2 



6^ Increment the time index k and go to step . 
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Effect of shifts in the operating point 

The described technique can, however, be ineffective (in terms of minimis- 
ing the modelling errors) under significant operational variations of the pipe 
because the system matrix A is computed for a given operating point. There- 
fore, the on-line monitoring of ‘stationary states’ of x, which undergo tech- 
nological transients, may be necessary. Such monitoring, however, has to be 
done carefully by applying specific methods, such as generalised likelihood 
ratio tests or Bayesian decisions, run-sum tests or two-probe t-tests, etc. 
(see, for instance, Basseville, 1988), which do not permit leak symptoms to 
vanish in time and are appropriate for FDI. Stationarity monitoring and FDI 
processing can be done in parallel, so as to, under specific conditions, interact 
accordingly. For instance, after a stationary state is detected, the estimated 
value of X can be applied in the model linearisation procedure, and a suit- 
able modification of the extended Kalman filter can then be performed. Thus, 
within these deliberations, it is assumed that leaks can be detected, precisely 
isolated and estimated during stationary periods of the monitored mass fiow 
process (Kowalczuk and Gunawickrama, 2000). 

21.5.4. Exemplary monitoring through the FMA-LDS 

The effectiveness (in terms of sensitivity, accuracy, reliability and robustness) 
of the FMA leak detection algorithm for a given pipeleg under selected operat- 
ing conditions was tested with the use of the pipe simulator (Gunawickrama, 
2001). The emulated pressure and fiow rate measurements corresponding to 
various operating conditions of the pipeleg were input to the FMA estimation 
system on-line. The results of an extended study are presented (Gunawick- 
rama, 2001). In the present chapter only some selected results are shown for 
illustrative purposes. 

The supervised pipeleg had the same specifications (Table 21.2) as those 
used in the simulation carried out while testing the FSA (Section 21.4). Like- 
wise, both the measurement process and the analysed leak are characterised 
by Table 21.3 and Table 21.4, respectively, whereas the set of parameters 
pertinent to the FMA method is given in Table 21.6. Some of the parameters 
were fixed on the trial-and-error basis. 

As has been mentioned, the number of the hypothetical pipe sections, 
which define the instrumental grid on the (z,t) plane, is an important pa- 
rameter of the state observer. With M = 6, i.e., with five pipe sections of 
the length Ax = 8 [km], the time span (the FMA iteration period) amounts 
to At = 26 [s] (see Subsection 21.5.1). As the measurements were sampled 
at every Ts = 10 [s] while the observer and filter ran each At = 26 [s], 
the corresponding pressure and fiow rate measurements had to be obtained 
by interpolation. Moreover, we assumed that an additional^ ^ point of pres- 
sure measurements at the 16 [km] was available on-line. Those are necessary 

Apart from the boundary measurements Qin and Pex • 
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Table 21.6. Parameters of the fault-model algorithm (FMA) 
applied (Ji is the identity matrix with i rows and i columns, 
while Oixj stands for the null matrix of i rows and j columns) 



Parameter 



Symbol 



Value 



Unit 



State observer 



Pipe length 


Z 


40.0 


[km] 


Pipe diameter 


D 


640 


[cm] 


Velocity of sound 


c 


300 


[m/s] 


Friction coefficient 


A 


0.033 


— 


Number of sections 


M 


6 


— 


Iteration time interval 


At 


26.67 ( # Ts) 


HI 


Numerical accuracy 


c 


0.01 


— 



Extended Kalman filter 



Initial steady state 



X G Mr 



[76.32 72.35 68.37 64.39 60.42 
120 0 ... 0 ]^ 



120 



Covariance matrices 






S G 



Re Mr 



^0 

V = 



s = 



(Tpip 


05X5 


05X4 


05X5 


ctIU 


05X4 


04x5 


04x5 


(t|^/4 


(Tpis 


05X5 


05x4 


05X5 


(tIu 


05x4 


O 

X 

or 


04x5 




r al 


0 


0 1 



R = 



(t; 0 

0 4 j 

(Tq = 6 [kg/s], 



(Tp = 2 [bar], 

(jqL = 3 [kg/s] (Tp = 1 [bar], 
<T, = 2 [kg/s], = 1 [kg/s] 



references for the Kalman filter and thus for the improvement of the FMA 
adaptivity. The FMA algorithm was implemented with slightly different pa- 
rameters than those of the true model (compare Tables 21.2 and 21.3 with 
Table 21.6), which means that the simulations included some representative 
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modelling errors. The steady state x used for computing the Kalman state 
innovative sequence dx^ = — x was gained by the low-pass filtering of the 

measurements at the pipe boundaries. The steady-state pressure values were 
derived from the assumption about a linear pressure drop along the pipeline, 
while the fiow rates Q were taken by averaging measurements. The referenc- 
ing structurally modelled leak fiow rates contained in x were, obviously, set 
to zero. 

The above FMA-LDS system was applied to monitor the process of 
pipe transmission. Similarly to the experimentation of Subsection 21.4.5, the 
simulated system was subjected to varying operational conditions. They were 
defined by two technological shifts of the operating point (invoked by the 
variation of the inlet pressure at topi = 15 [min] and top2 = 175 [min]) 
and a leakage (represented by the parameters of Table 21.4) emulated at 
= 75 [min]. The resulting pipeline transients are depicted in Figs. 21.10(a) 
and (b) in terms of the available measurements, and the run-time performance 
of the FMA-LDS is shown in Figs. 21.10(b), (c), (e)-(g). 

Despite the varying pipe dynamics, the observer continues to function 
satisfactorily by producing agreeably accurate estimates pQ (Fig. 21.10(b)) 
and qi (Fig. 21.10(c)) of the corresponding measurements Pex and Qm- 
However, leak isolation (Figs. 21.10(f) and (g)) based on the structurally 
modelled leak fiow rates Q 2 , qs, q^ and q^ (Figs. 21.10(e)) is quite sensitive 
to both the operational variations and leak occurrence, which leads to false 
leak alarms considerably reducing the reliability of the FMA. Good news is 
that these alarms have a clear transient characteristic, and a simple pattern 
recognition technique should be sufficient to avoid such false alarms (Kowal- 
czuk and Gunawickrama, 2001). 

Consider the first shift in the operating point at topi- As is shown in 
Fig. 21.10(e), the structurally modelled leak fiow rates (after positive over- 
shooting) soon adapt to the new operating conditions, causing their total 
leak fiow rate qp (Fig. 21.10(f)) return towards zero, thus indicating new 
leak-free pipe dynamics. Even though the leak appears (at tp) before qp has 
fallen below the alarm threshold = 0.1 (i.e., before completing the adap- 
tation phase), the FMA algorithm quite accurately seizes information on the 
leak (Fig. 21.10(f) and (g)). The estimation process was, however, influenced 
by the second shift of the operating point (at toP 2 )- The resulting negative 
overshoots of the structurally modelled leak fiow rates have similar effects on 
the estimated leak parameters as compared to the operational variation at 
topi- In spite of that, the estimated leak parameters soon stabilise around 
their proper values, demonstrating the desired adaptive capability of FMA 
algorithms. 

Despite the above-mentioned deficiency in reliability caused by signifi- 
cant operational variations, the FMA-LDS system continues to function ef- 
fectively by maintaining the capability of providing robustly accurate infor- 
mation on the pipe leakage. Moreover, since the parameters of technological 
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(d) 



860 



Z. Kowalczuk and K. Gunawickrama 






Fig. 21.10. Leak detection via the FMA under varying oper- 
ational conditions with the generic time moments: topi ~ first 
variation, th ~ leak, toP 2 - second variation 



shifts in the operating point (magnitudes of pressure variations and moments 
of their occurrence, for instance) are known a priori^ false alarms related to 
these conditions can be easily ignored. 

In conclusion we recall that the FMA-LDS system is based on nonlin- 
ear state-observation and the extended Kalman filtering, which minimises 
the system’s overall modelling errors. The leak parameters are obtained by 
means of structurally modelled leaks assigned to predetermined locations 
along the monitored pipeleg. The estimated leak flow rate, being the sum of 
the structurally modelled leak flow rates, is ideally zero in leak-free pipeline 
operations. If this estimate exceeds a certain threshold, the leak alarm is trig- 
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gered and the estimation of the leak location (which is also computed based 
on the structurally-modelled leak flow rates) is initiated. A well-tuned FMA 
algorithm is capable of detecting early small leaks by yielding quite accurate 
information on the leak parameters. As the structurally modelled leak flow 
rates belong to the set of the estimated pipe states, the accuracy and sen- 
sitivity of the leak detection algorithm highly depends on the dynamics of 
the state observer. False alarms that can arise under technological shifts of 
the pipe’s operating point signiflcantly reduce the system’s reliability. Never- 
theless, the FMA-LDS system soon adapts to the new operating conditions 
causing the false alarm disappear, which sustains the general suitability of 
the method for leak detection. Thus the operational variations can temporar- 
ily contaminate the leak parameter estimates but the FMA-LDS system is 
robust enough not to loose the valuable information on the existing leak. 



21.6. Summary 

In this chapter we have considered internal methods of a rapid detection of 
leaks for gas and liquid transmission pipelines, whose main objectives, i.e., 

• reliable leak alarms (sufficiently sensitive to small leaks), and 

• an accurate determination of leak parameters (size [kg/s] and 
location [km]), 

are achieved on the basis of the robust identification of the leak-resultant 
leak-parameter- dependent physical phenomenon called the low-pressure rar- 
efaction wave, which propagates away from the leakage at the speed of sound. 
The leak effect can, therefore, be rapidly tracked by processing the internal 
pipeline quantities measured on-line (preferably only the pressure and flow at 
the pipe boundaries) via the advanced hydraulic modelling of complex fluid 
mechanics. 

An improved balancing technique uses simple signal-based volume bal- 
ancing and correlation techniques. Them IB A determines a reference to the 
leak-free pipe dynamics and compares it to the corresponding trajectory 
of on-line measurements for discrepancies, which are known to characterise 
leaks. When statistically tested deviations trigger the leak alarm, the refer- 
ence signals are frozen. Therefore, a comparison of the pipe dynamics before 
and after the leak is possible and can serve as the basis for estimating the 
critical leak parameters. Suitable relationships for the leak parameters can 
be derived by the qualitative and quantitative evaluation of the leak effects 
on the pipeline dynamic deviations. 

A fault-sensitive approach to leak detection is based on nonlinear adap- 
tive state observation and residue generation. A distributed parameter system 
describing the fluid flow (gained from the principles of mass and momentum 
conservation) is lumped by applying a centred difference scheme, and the ob- 
tained discrete-time discrete-space model is then utilised in the construction 
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of the state observer. The leak-free pipe state estimated on-line consists of 
pressure and flow rate variables associated with predeflned locations along 
the pipeline. The discrepancy between the estimated and measured pipe dy- 
namics (residuals) is statistically tested for leak alarms and for parameter 
estimation (by utilising a technique similar to the IB A). In order to minimise 
unavoidable modelling errors, an adaptive mechanism that updates the value 
of the immeasurable friction coefficient within the observer is recommended. 

The fault-model approach to leak detection is also based on nonlinear 
distributed parameter system modelling and state estimation. As opposed 
to the FSA method, here the applied state observer is capable of modelling 
both the leak-free and leaking conditions of the pipe due to the presumable 
structurally modelled leaks at known locations along the pipeline. Partial dif- 
ferential equations representing the fluid (a gas or a liquid) flow are lumped 
by using the method of characteristic lines, and the obtained nonlinear finite 
difference model (discrete in time and space) is solved with the aid of the it- 
erative numerical method. The leak location and size are estimated based on 
these components of the state vector which estimate the structurally mod- 
elled leak flow rates. The modelling errors are minimised with the use of 
the extended Kalman Alter technique applied to a model linearised around a 
stationary state monitored on-line. 

The applicability and effectiveness (sensitivity, accuracy, reliability and 
robustness) of the presented techniques have been illustrated by means of 
simulated experiments performed with the use of a computer emulator of 
transmission pipelines. 
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Chapter 22 



INDUSTRIAL APPLICATIONS^ 

Jan Maciej KOSCIELNY*, Michat BARTYS*, 
Michal SYFERT*, Mariusz PAWLAK** 



22.1. Introduction 

The requirements concerning the industrial application of fault detection and 
isolation seem to be quite natural taking into account process safety, end 
product quality and process economy factors. The pilot industrial imple- 
mentations of chosen diagnostic methods will be presented and described in 
this chapter. Those methods are principally based on fuzzy or fuzzy neural 
network models used for the fault detection and isolation purposes. The pre- 
sented examples deal with very complex systems, e.g., the steam-water line of 
the power boiler, the sugar manufacturing evaporator unit, as well as simple 
ones, namely, final control elements. To demonstrate the role and importance 
of fault detection and isolation one can present an example of a particular 
fault of the final control element controlling the infiow rate of thin juice to 
the first evaporator apparatus of a sugar factory. This fault may completely 
stop the process if not detected within 30 s. An example of the application 
of diagnostics to the development of the fault tolerant condensation power 
turbine controller is also briefiy described. 
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22.2. Fault diagnosis of the steam-water line 
of the power boiler 

22.2.1. System description 

The steam-water line of the power boiler is a part of a technological installa- 
tion located between the boiler itself and the inlet of fresh steam to the power 
turbine. The steam- water line has a symmetrical structure. It consists of two 
parts (left and right). The synoptic diagram of the steam- water line of block 
no. 7 in the Siekierki Power Plant (Warsaw, Poland) is given in Fig. 22.1. 
The installation consists of a set of the following subsystems: a steam su- 
perheater (1), a cooling water injector (2), the control valve of cooling 
water (3). The components (2) and (3) constitute the steam attemperator 
assembly. 




Steam is overheated in successive superheaters to a temperature higher 
than steam temperature on the boiler outlet. The superheater is an assembly 
consisting of a set of steel pipes mounted in the boiler and heated by combus- 
tion gases. Steam temperature should be controlled very carefully due to the 
optimisation demands of the watt-hour efficiency. Too high steam tempera- 
ture may cause wear or even damage of the elements of the steam- water line. 
Steam temperature is lowered by injecting cool water into the steam flow. For 
this purpose the steam attemperator assembly is used. In the attempertor the 
cooling water flow is controlled by the control valve. 

The problem of fault detection and isolation in the steam- water line will 
be described using the example of an assembly consisting of a steam super- 
heater, a cooling water injector and a cooling water control valve (Koscielny 
et al, 1999; Koscielny and Syfert, 2000a; 2000b). The diagnostic system of the 
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entire steam-water line is based on extended modelling techniques consider- 
ing additional heuristic relations between the process variables. For example, 
steam temperatures measured in analogous parts of both branches of the 
steam- water line should be strongly correlated. Too big a difference between 
those temperature values may be perceived as a symptom of a false process 
operation or the occurrence of a fault or faults. 

The block diagram of the analysed steam-water line subassembly with 
measurement points indicated is given in Fig. 22.2. In the attemperator and 
superheater assemblies the redundant temperature transmitters are applied. 




Fig. 22.2. Block diagram of the subassembly of the steam- water 
line of the power boiler 



Hardware redundancy, in this case, allows gaining better fault detectability 
and isolability factors. The application of temperature redundancy is very 
common practice in such cases. The steam flow rate and pressure measure- 
ments are more problematic. Steam pressure and flow transmitters are in- 
stalled only on both terminals of the water-steam line. So, the same process 
values are used in diagnostic algorithms of the first and the second attem- 
perator for both branches of the steam- water line. In this case the diagnostic 
process is additionally complicated by the necessity of considering transport 
delays or integrating the steam and water flows. 

The index of process variables acquired from the attemperator-super- 
heater subassembly is given in Table 22.1. The faults that should be recog- 
nised in this subassembly are presented in Table 22.2. They refer to instru- 
mentation, actuators and most probable installation component faults. The 
sets of faults for the rest of the steam line components are created in a 
similar way. 
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Table 22.1. Set of the process variables of 
the attemperator-superheater subassembly 



Symbol 


Process variable 


Tpi 


Steam temperature on the inlet of the attemperator 


rpR 

J-Pl 


Steam temperature on the inlet of the attemperator 
- a redundant measurement 


Pw 


Cooling water pressure on the inlet of the control valve 


U 


Control value of the water injector controller 


X 


Position of the cooling water control valve plug 


Fw 


Flow rate of cooling water 


Tp2 


Steam temperature on the outlet of the attemperator 


rj-iR 

J-P2 


Steam temperature on the outlet of the attemperator 
- a redundant measurement 


Tp3 


Steam temperature on the outlet of the superheater 


rjiR 


Steam temperature on the outlet of the superheater 
- a redundant measurement 


Fp 


Steam flow rate 


Pp 


Steam pressure 



Table 22.2. Set of the analysed faults of 
the attemperator-superheater subassembly 



F 


Fault description 


h 


Instrumentation fault of Tpi 


/2 


Instrumentation fault of Tpi 


h 


Instrumentation fault of Tp 2 


u 


Instrumentation fault of Tp 2 


h 


Instrumentation fault of Tps 


h 


Instrumentation fault of Tp^ 


f7 


Instrumentation fault of Fp 


fs 


Instrumentation fault of Pp 


h 


Instrumentation fault of Fw 


fio 


Instrumentation fault of X 


hi 


Instrumentation fault of Pw 


fl2 


Servo-motor fault 


/l3 


Cooling water control valve fault 


/l4 


Injector fault 
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22.2.2. Fault detection of the water-steam line of the power boiler 

Algorithms based on partial system models are used for the fault detection 
of the components of the water-steam line of the power boiler. Partial models 
are created also on the basis of redundant measurements. For fault detection 
diagnostic tests presented in Table 22.3 are used. The first four tests are based 
on the partial models of: the cooling water injector, the steam superheater, 
the cooling water control valve and the valve servomotor. The remaining 
three tests refer to the comparison of the results of redundant measurements 
achieved in the installation (Fig. 22.2). 

The first test is based on the partial model of the cooling water injector. 
The model inputs are: steam temperature on the injector inlet Tpi, steam 
fiow rate Fp and cooling water fiow rate Fw- The output of the model is 
steam temperature on the injector outlet Tp^. 



Table 22.3. Set of diagnostic tests of the attemperator-superheater assembly 





Algorithm 


Description of the algorithm 


Si 


n = \Tp2-T^2{Tpi,Fp,Fw)\ < Ai 


Test of the conformity of steam 
temperature after passing the 
attemperator (using a model) 


S2 


T2 = |Tp3 — Tp 3 (Tp 2 , Fp)| < A2 


Test of the conformity of steam 
temperature after passing the 
superheater (using a model) 


S3 


ra = \Fw — F^{X, Pw — Pp)\ < A3 


Test of the conformity of the 
water flow through the control 
valve (using a model) 


S4 


n = \X-X*{U)\ < A4 


Test of the conformity of control 
valve plug displacement (using a 
model) 


S5 


^5 = |Tpi — Tpir\ < A5 


Test of the conformity of re- 
dundant measurements of steam 
temperature at the inlet to the 
attemperator 


S6 


re = |Tp2 — Tp2r\ < Ae 


Test of the conformity of re- 
dundant measurements of steam 
temperature at the outlet from 
the attemperator 


S7 


rr = |Tp3 — Tp3r\ < A7 


Test of the conformity of re- 
dundant measurements of steam 
temperature at the outlet from 
superheater 
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The second test is based on a simple model of steam temperature on the 
superheater outlet Tps gained from the knowledge of steam temperature on 
the superheater inlet Tp 2 and steam flow rate Fp. This model does not take 
into account changes of the heat flow in combustion gases due to a lack of 
appropriate measurements. 

The third and fourth tests are based on the models of the control valve 
of the cooling water and the servomotor driving control valve. The water 
flow rate Fw is modelled based on the knowledge of the control valve plug 
displacement X, the water pressure drop across the valve Pw and the steam 
pressure Pp. The plug displacement of the control valve is correlated with 
the controller signal U. 

The results of investigations of the detection features of the algorithms 
based on partial models were given in (Koscielny et al, 1999; Koscielny and 
Syfert, 2000a; 2000b). Fuzzy models created using a modifled Wang-Mendel 
approach as well as neural models based on unidirectional perceptron net- 
works with one or two hidden layers and a sigmoidal activation function 
were mainly used in the investigations. The algorithm of error back propaga- 
tion was used for model tuning. Sufficiently good results of modelling were 
achieved except for the model of the steam superheater. In the case of the 
superheater model, the primary source of errors is the lack of heat energy 
measurements of combustion gases necessary for proper model design. 

Below in Fig. 22.3 there is an example of a graph illustrating the confor- 
mity degree of the modelled and real values of the cooling water flow rates 
in the case of a full system aptitude. In the upper window two overlapping 
signals are shown: the model output and the real process value. In the lower 
window the normalised difference of both signals related to the span of the 
real signal is presented. 



F„ [t/h] 




Fig. 22.3. Illustration of the modelling results of the cooling water 
flow rate through the control valve; a fuzzy model based on a modified 
Wang-Mendel approach was applied 
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22.2.3. Fault isolation of the water-steam line of the power boiler 

Fault isolation concerning the components of the water-steam line of the pow- 
er boiler is based on the technique of comparing diagnostic tests results with 
reference patterns given in the binary diagnostic table. The binary diagnostic 
table for the subassembly attemperator-superheater is given in Fig. 22.4. One 
can easily show that all of the examined faults are detectable except for the 
faults /s, /ii and / 13 . 



FS 


h 
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fz 
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II 


fs 
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/lo 




/12 


/l3 


/l4 


Si 
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1 
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1 
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1 
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1 




1 
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S6 
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S7 
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1 



















Fig. 22.4. Binary diagnostic table for the superheater-attemperator subassembly 



The conformity degree of diagnostic test results with fault signatures giv- 
en in the binary diagnostic table may be expressed in fuzzy terms. For this 
purpose a two-valued fuzzy evaluation of residuals was applied. The mem- 
bership functions of the linguistic term residuum are adjusted individually in 
every diagnostic test. 

The degrees of fuzzy inference rules activation are determined based on the 
knowledge of the membership function of diagnostic tests results. The PROD 




Fig. 22.5. Fuzzy bi-valued evaluation of residuals 
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operator is used for determining the conformity degrees of fault signatures 
with fuzzy results of diagnostic tests: Those degrees after simple processing 
(normalisation with the operator PROD/E) play the role of fault occurrence 
factors (F-DTS method). 

Below, an example of reasoning with the application of the F-DTS 
method is given. For simplicity, the influence of the symptom duration and 
limitations of the maximal time on the diagnosis are neglected. 

Let us assume that the following residual values and diagnostic signals 
are registered: 



n 


= 15 ^ 5i = 


1 1 
1 1 


II 
1 1 


= 1}, 


T2 


- 12.5 52 


II 


' = 0.2, = 0.8}, 


rs 


= 2 Ss = 




0.9, 


= 0.1), 


va 


= 5 ^ Sa = 


II 


Mf - 


0}, 


^5 


== 2 ^ 5s = 


II 


0.9, 


= 0.1}, 


^6 


= 10 5e 




= 0.1, 


= 0.9}, 


ri 


= 8 ^ 5r — 


II 


= 


0}. 



The scheme of determining the values of diagnostic signals is shown in 
Fig. 22.7. Further, the activation coefficient of particular inference rules is 
determined (the PROD operator). After normalisation (the PROD/E oper- 
ator), the fault occurrence factors are determined. In Fig. 22.6 the fuzzy con- 
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Fig. 22.6. Conformity degrees of fuzzy diagnostic and reference signals 
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Fig. 22.7. Scheme of residuals evaluation 



formity degrees of diagnostic signals, reference patterns and values generated 
by conformity evaluation operators of on-line signatures and reference signa- 
tures (the PROD and PROD/E operators) are shown. Figure 22.6 results in 
the diagnosis DGN = /s pointing out a fault of the instrumentation Tp 2 . 

Figure 22.8 shows two examples of fault simulations for the testing pur- 
poses of fault isolation algorithms. Every graph consists of two windows. The 
upper window shows the scenario of fault simulation and the lower window 
shows the value of the certainty degree of fault occurrence. 

In the first case the fault was simulated by introducing some modifications 
to the signals in the archive files of process variables. Further, in the off-line 
mode, the fault detection and isolation procedures were performed. 
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In the second case fault isolation in real time was presented. Setting the 
cooling water flow rate directly from the Distributed Control System (DCS) 
is foreseen as an artiflcial fault introduction scenario in this case. 

An example of the visualisation of the obtained diagnosis is shown in 
Fig. 22.9. The appropriate numbers on the monitor screen point out the 
values of the diagnostic signals and, which is most important, the values of 
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Fig. 22.8. Simulation of a steam temperature transmitter fault 
and a fault of the actuator control valve assembly 
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Fig. 22.9. Example of diagnosis visualisation 
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the certainty degrees of particular faults. The process operator is therefore 
informed about more or less uncertain faults, which increases his knowledge 
about the process state and enables him to perform more motivated decisions 
also in situations where none of the faults has obtained a sufficiently high 
certainty level. 



22.3. Fault diagnosis of the evaporation station 
in a sugar factory 

22.3.1. System description 

The process of thickening by evaporation thin sugar beets juice is performed 
in an evaporation station. A typical evaporation station consists of 4 to 7 
pieces of technological apparatus (evaporators). The evaporators are joined 
into groups called divisions. Every division may contain one or more evapora- 
tors. Vapours produced by one division are fed into the subsequent one. The 
beet juice passing through subsequent evaporators becomes more dense due 
to water evaporation. The heating energy is delivered directly to the heating 
chamber of the first evaporator. This energy is recuperated from the waste 
steam delivered from the power turbine. Successive evaporators are heated 
by the energy contained in the vapours fed from the preceding divisions. 
Therefore the vapour’s temperature and pressure drop down in subsequent 
divisions. To make the evaporation process more efficient, negative pressure 
is required in the last division. The vapours and waste steam are used also 
to heat the beet thin juice in the set of preheaters prior to its infiow into the 
first evaporator. 

The juice evaporation process is controlled by means of closed loop sys- 
tems ensuring proper process quality factors. Among the main control sys- 
tems used in this case there are: juice level control acting on juice inflow 
rates into particular evaporators, the temperature of the juice inflow to the 
first evaporator, the vacuum control system in the last division, and the juice 
outflow control system from the evaporation station. The synoptic diagram 
of the evaporation station from the Lublin Sugar Factory (Poland) is shown 
in Fig. 22.10. In the diagram one can distinguish: buffer tanks of thin and 
thick juice, juice pre-heaters, and seven evaporators. The evaporators are as- 
sembled in five divisions. The second and the third division consist of two 
evaporators each. In that case the vapours are fed parallel to the heating 
chambers of evaporators in the division. In the fourth and the fifth division 
the apparatus applied is different than that from the previous sections. 

The basic element of any evaporation station is, of course, the evaporator 
itself. To understand any diagnostic test it seems necessary to describe the 
general construction and principles of action of the evaporator apparatus. In 
the example let us focus our attention on an older type of the evaporator, 
shown in Fig. 22.11. 
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Fig. 22.10. Synoptic diagram of the evaporation station 
in the Lublin Sugar Factory in Poland 




Fig. 22.11. Sketch of the evaporator apparatus 



Beet juice is fed into the lower part of the apparatus filling out the pipes 
installed inside the heating chamber. Fresh or waste steam (from the vapours 
from another apparatus) is fed into the heating chamber, thus heating the 
juice in the pipes. The steam, producing heat, condenses into drops that 
are fed out from the heating chamber to the set of condensate water treat- 
ment apparatus. The mixture of vapours and juice drops in the evaporator 
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chamber floats up towards the vapour chamber. In this chamber the juice 
drops are collected by means of a drop catcher. Pure vapours float towards 
the upper part of the vapour chamber and are fed out from the evaporator 
and next fed into the subsequent apparatus for heating purposes. The con- 
densed juice is also fed out into the next evaporator pipeline system. The juice 
level in the evaporator is controlled by the action of juice inflow into the ap- 
paratus. 

Due to the complexity of the described process and the complexity of the 
diagnostic system applied let us focus our attention only on a single evapo- 
rator apparatus. Information dealing with the entire diagnostic system, par- 
ticularly information dealing with the interactions between the evaporators, 
is given only if necessary. 

The index of process variables used for diagnostic purposes of the evapo- 
rator is given in Table 22.4. The set of all variables collects analogous process 
variables from all divisions and, additionally, a few variables reflecting juice 
temperature in the pre-heaters and control variables of the system controlling 
the delivering of the steam to the first evaporator. 



Table 22.4. Set of the evaporation process variables 



Symbol 


Description of process variable 


^0 


Juice density on the evaporator inflow 


^1 


Juice density on the evaporator outflow 


Li 


Juice level in the evaporator 


Pi 


Pressure in the heating chamber 


To 


Temperature of the vapour chamber of the previous evaporator 


Ti 


Temperature in the lower part of the heating chamber 


T2 


Temperature in the higher part of the heating chamber 


Ta 


Temperature in the vapour chamber 


Ti 


Juice temperature on the evaporator outlet 


Fi 


Juice inflow rate 


Ui 


Juice level control value 


h 


Binary signal of the juice pump state 



In Table 22.5 the set of all examined faults of the evaporator is given. 
The set consists of instrumentation faults, actuator faults, and process faults. 
The set of all faults of evaporation station is analogous and contains fault 
specification of all divisons. The problems of evaporation station diagnosis are 
described in many papers, e.g., in (Koscielny and Pieniqzek, 1993; 1994). 
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Table 22 . 5 . The set of faults of evaporator 



fk 


Fault description 


h 


Instrumentation fault of Ao 


h 


Instrumentation fault of Ai 


h 


Instrumentation fault of Li 


h 


Instrumentation fault of Pi 


h 


Instrumentation fault of To 


h 


Instrumentation fault of Ti 


/t 


Instrumentation fault of T2 


/s 


Instrumentation fault of T3 


h 


Instrumentation fault of T4 


fio 


Instrumentation fault of Pi 


/ii 


Fault of the feedback path of Pi 


/12 


Instrumentation fault of 7 i 


/l3 


Pump fault 


/l4 


Pump switch off 


/l5 


Actuator fault 


/16 


Controller fault 


fl 7 


Uncontrolled water inflow (valve Zi opened) 


fis 


Ammonia gases in the heating chamber 


fl 9 


Water in the heating chamber 


/20 


Build-up on the evaporator pipelines 



22.3.2. Fault detection of the evaporator 

The scheme of detection algorithms will be given using an example of the 
evaporator apparatus. The set of process variables shown in Table 22.4 is used 
in the diagnostic tests applied. The tests are performed for the detection and 
isolation of faults given in Table 22.6. The tests 53 to sq are based on the 
heuristic relations between the process variables. The remaining two tests, si 
and S 2 , are based on testing the conformity of the model outputs with the 
process variables. Two simple partial models are created to reach this goal. 
The first model is used for re-constructing the relation between saturated 
steam temperature and pressure in the vapour chamber, while the second 
one is used for modelling the input-output characteristics of the actuator 
assembly. 
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Table 22.6. Set of diagnostic tests of the evaporator 



Sj 


Algorithm 


Description 


Si 


n = \T3-Ti{Pi)\<Ai 


Conformity test of temperature Ts and tem- 
perature T 3 * of saturated steam in the vapour 
chamber (using a model) 


S2 


7-2 = \Fl - Fi*(i7)| < A 2 


Actuator test - conformity test of the thin juice 
flow rate and the controller output (using a 
model) 


S3 


rs = \T2 — Ti\ < As 


Conformity test of temperatures T 2 and Ti in 
the upper and lower part of the heating chamber 


SA 


V4 = \Ta — Ts\ < Aa 


Conformity test of vapour chamber temperature 
Ts and the temperature of juice on the appara- 
tus outlet Ta 


S5 


rb = |7"i — To| < As 


Conformity test of vapour temperature from the 
previous apparatus To and the temperature of 
the heating chamber Ti 


S6 


re = |T 2 — Ts — Kt\ < Ae 


Test of the temperature gradient between the 
heating and vapour chambers 


S7 


r7 = |Li-Lf^|< A7 


Test of the control loop error (I/f ^ - set-point 
value) 


S8 


rg = \Ai — A 2 — Ka\ < Ag 


Test of the juice density increase {Ka - mean 
value of the juice density increase in the state 
of full operability) 


S9 


rg = /i - 1 


Test of pump operability 
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Fig. 22.12. Illustration of the quality of the model 
of the temperature of saturated steam 
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The model was tuned based on experimental results. The achieved resid- 
uals are evaluated binarily. For this purpose, for every test the residual space 
corresponding to positive and negative test results was defined. The examples 
of graphs illustrating the quality of the achieved models are given below. The 
conformity degrees of the modelled and real changes of process variables in 
the state of a full process aptitude were chosen as the model quality factors. 
In the following figures, in the upper window two overlapping signals are 
shown: the model output and the real process value. The normalised differ- 
ence of both signals referring to the real signal span is shown in the lower 
window. 

Figure 22.12 shows an example illustrating the quality of the model of 
saturated steam as a function of its pressure. In this case the fuzzy model 
was applied. The residual ri (Table 22.6) is determined based on this model. 

In Fig. 22.13 an example illustrating the quality of the rate of juice flowing 
through the control valve is shown. In this case a multilayer neural network 
was applied in the model. Based on this model the value of the residual f 2 
was obtained. 




Fig. 22.13. Illustration of the quality of the model of the thin juice mass flow rate 



The pesented models are applied to all evaporators in the station. For 
detection tasks some additional partial models of other evaporation station 
components such as the steam pre-heaters or the control valve of fresh steam 
are used. As in the case of the water-steam line of the power boiler, the 
technique of decomposing a technological installation into parts possible to 
be modelled using partial models was applied. 

Fault detection is performed with the application of tests given in Ta- 
ble 22.6. Verifying the detectability features of the tests was performed using 
an appropriate fault simulation technique. Single faults were generated dur- 
ing simulation in pre-defined time windows. Particular scenarios reflecting 
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fault histories and fault relative strengths were developed. In Fig. 22.15 the 
examples of test results with the application of the simulated faults / 4 , /s, 
/lo and /i 5 are given. As it is easily seen from Fig. 22.15, the residuals ri 
and T 2 are sensitive to the simulated faults. 

One can observe residual sensitivity to the artificially generated faults 
applied. After the over-passing by the residual signals of predefined limits, 
the fault detection signal appears. After fault recovery, the residual values 
again fiuctuate around the zero value. 
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Fig. 22.14. Binary diagnostic table for the evaporator 




Fig. 22.15. Examples of test results for simulated faults of the vapour temperature 
transmitter and the servomotor-control valve assembly 
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22.3.3. Fault isolation in the evaporator 

The binary relation between the faults and the symptoms for the evapora- 
tor apparatus is presented in Fig. 22.14. The following elementary blocks 
are to be distinguish: {/i,/2,/2o}, {/3,/ie}, {/4}, {/s}, {/e}, {/?}, {/s}, 
{/g}, {/io,/ii}, {/12}, {/i3,/i5}, {/i 4 }, On}, {/i8,/i9}- Faults in elemen- 
tary blocks, {/i,/2,/2o}, {/3,/ie}, {/io,/ii}, {/i8,/i9}, arenot isolable. In 
practice, better isolability may be achieved when considering analogous tests 
results from the neighbouring apparatus. For example, to achieve the sepa- 
ration of the fault assigned with the water presence in the heating chamber 
from the fault caused by the overflow of ammonia gases in this chamber (ele- 
mentary block {/i8,/i9}) it is sufficient to apply a three- valued residual rs 
evaluation. 

Below, some examples of the application of the DTS diagnosis method 
(Section 19.2) for a single evaporator apparatus are given. For simplicity, the 
symptom duration and limitation for the allowed diagnosis time were not 
considered. 

Example 22.1. Let us assume that a symptom 52 = 1 was observed. The 
fault isolation procedure is performed as follows: 

(a) The set of possible faults Fi, the set of diagnostic signals S\ useful 
for fault isolation and the set of the diagnostic signals Sw{R) used are 
determined: 

S 2 = 1 => = F{s 2 ) = {/l0,/ll,/l3,/l4,/l5}, 



5^ — {s2) S4) s?) sg}- 

(b) In appropriate time moments the values of particular diagnostic signals 
are analysed: 

sq = 0 = {/105 /115 /i 3 5 /is}- which reduces the set of possible^ 

Sw{2) = {S9}, 



sj = 1 ^ F^ — {/i3,/i5} - which reduces the set of possible faults, 

5 w( 3 ) = {59, 57}, 



S4 = 0 => F^ = {/isj/is} - which verifies the set of possible faults, 



5 'w( 4 ) = {sg, S7, S4}, 



S2 = 1 ^ F^ = {/135/15} ~ which verifies the set of possible faults. 



5 ^( 5 ) — {59,57,54,52}. 

The diagnosis points out two unisolable faults of the pump and the actuator: 

DGN={{fu}Afi^}}^ 
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Example 22.2. Let us assume that a symptom ss = 1 was registered. The 
fault isolation procedure is performed as follows: 

(a) S 3 = 1 ^ = F{ss) = {/e, /?, fis^fid}, 

= {S3,S5,S6>. 

(b) Next, in appropriate time moments the values of particular diagnostic 
signals are analysed: 

55 = 1 = {/e} - which reduces the set of possible faults, 

5iy(2) = { 55 }, 

sq 0 — {/e} - which verifies the set of possible faults, 

5vk(3) = { 55 , sq}, 

53 = 1 => — {/e} - which verifies the set of possible faults, 

5v^(4) = { 55 , 56, 53 }. 



The diagnosis points out an instrumentation T1 fault: 

DGN=:{fe}. 

The binary diagnostic table for the entire evaporation station is much 
more complex than that given in Fig. 22.14. In Fig. 22.16 a part of this 
table is shown, including 43 faults and 45 diagnostic tests. Due to applying 
partial models, the binary diagnostic table for all evaporation stations has a 
shape similar to a diagonal one. In Fig. 22.16 the groups of tests assigned to 
particular evaporator sections are separated with a double line. In this case 
the tests assigned to a particular division detect mainly faults appearing in 
this particular division. Only few faults are detected by tests assigned to the 
neighbouring evaporator divisions. 



22.4. Fault diagnosis of the pneumatic actuator-positioner- 
control valve assembly 

22.4.1. Introduction - the aims of the diagnostics of final control elements 

Control tasks of technological processes may be generally defined in terms 
of acting on the energy and mass fiows. Actuators (final control elements) 
are used for real-time acting on those flows. An example of a final control 
element is shown in Fig. 22.17. 

Faults or malfunctions of final control elements (e.g., control valves, servo- 
motors, positioners) occur relatively often in industrial practice. Actuators 
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Fig. 22.16. Part of the binary diagnostic table designed for the evaporation station 




Fig. 22.17. Final control element consisting of a control valve, 
a pneumatic spring- and- diaphragm actuator, and a positioner 

are installed mainly in a harsh environment: high temperature, high pres- 
sures, low or high humidity, dusty pollutants, chemical solvents, aggressive 
media, vibrations, etc. This has a crucial influence on the flnal control el- 
ement’s predicted lifetime. A malfunction or failures cause long-term pro- 
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cess disturbances or sometimes even forces an installation shut down. More- 
over, faults of final control elements may lower the final product’s quality 
or cause significant economic losses. For fault prevention or prediction, the 
on-line diagnostics of final control elements has been applied. Permanent 
or occasionally performed diagnosis of actuators cuts significantly the in- 
stallation maintenance costs. In industrial practice the inspection of final 
control elements is performed periodically. The inspected devices are dis- 
mounted from the technological installation and tested using special ser- 
vice set-ups regardless of their real technical standing. In most cases peri- 
odical inspections do not indicate any faults in the tested devices. The re- 
mounting of the previously dismounted final control element is often more 
costly process than dismounting because of the necessity of applying addi- 
tional measures (e.g., re-adjusting the pipelines terminals, the necessity of 
replacing the sealings) . The introduction of remote on-line diagnostics of ac- 
tuators may cut down the periodical inspection costs by a factor of 50-70%. 
In such cases the inspections and repairing of actuators are undertaken only 
if necessary. 

The diagnosis of actuators has been considered in many papers. For the 
fault detection and isolation of actuator numerous approaches, methods and 
algorithms have been developed, for example, the parity equation (Massoumia 
and Van der Velde, 1988; Mediavilla et al, 1997), the unknown input observer 
(Phatak and Wiswanadham, 1988), the extended Kalman filter (Oehler et 
a/., 1997), signal analysis (Deibert, 1994), fuzzy logic (Koscielny and Bartys, 
1997; 2000), 5-spline (Benkhedda and Patton 1997). 

There have also been developed intelligent positioners supporting auto 
diagnostic functions (Isermann and Raab,1993; Koscielny and Bartys, 1997; 
Yang and Clarke, 1997). The decomposition of diagnostic tasks in complex 
systems and the concept of intelligent actuators providing diagnostic features 
were also presented in the papers (Bartys and Koscielny, 1999; 2000; 2001; 
2002; Koscielny and Bartys, 1997; 1999; 2000; 2001; 2002). 

22.4.2. Fault detection of the final control element 

The final control element commonly used in industrial practice is an as- 
sembly consisting of three main parts: a control valve, a pneumatic spring- 
and-diaphragm actuator and a positioner. If the positioner belongs to the 
class of intelligent devices, then it is possible to implement additional (diag- 
nostic) positioner features besides the regularly performed control and com- 
munication functions. A diagram of the final control element with an in- 
telligent positioner is given in Fig. 22.18. The following notations are used: 
A - pneumatic spring-and-diaphragm actuator, V - control valve, VI, V2, 
V3 - cut-off valves, CPU - positioner central processing unit, ACQ - da- 
ta acquisition unit, MODEM - system for digital communication, D/A - 
digital-to- analogue converter, Ud - digital communication link, Ua - ana- 
logue communication link (option), E/P - electro-pneumatic transducer. 
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Positioner 




Fig. 22.18. Diagram of the final control element control consisting of 
a control valve, a pneumatic spring-and-diaphragm actuator and an intel- 
ligent positioner mounted on the pipeline of the technological installation 



DT - displacement transducer, PT - pressure transducer, FT - volume flow 
rate transducer, I - control current of E/P transducer, P - output pressure 
of the E/P transducer, F - volume flow rate signal. 

In the presented final control element there are 19 faults {/i, . . . , /19} to 
be distinguished. The faults fall into the following four categories: 

• control valve faults {/i, . . • , /?}, 

• pneumatic actuator faults {/s, • • • , /ii}, 

• positioner faults {/12, • • • , /is}, 

• general/external faults {/le, • • • , /ig}- 

Control valve faults: fi - valve clogging, /2 - valve or valve seat bild-ups, 
/s - valve or valve seat erosion, - increase in valve or bushing friction, 

- external leakage (leaky bushing, covers, terminals), /e - internal leakage 
(valve tightness), and /r - medium evaporation or critical flow. 

Actuator faults: fs - twisted servo-motor piston rod, fg - servomotor 
housing or terminal tightness, /lo - servomotor’s diaphragm perforation, 
and fii - servomotor’s spring fault. 

Positioner faults: /12 - electro-pneumatic transducer fault, /13 - rod 
displacement sensor fault, /14 - pressure sensor fault, and /15 - positioner 
spring fault. 
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General/ external faults: fiQ - positioner supply pressure drop, /17 - 
unexpected pressure change across a valve, /is - fully or partly opened cut- 
off valves, and /19 - flow rate sensor fault. 

Faults may appear in: the control valve, the servo-motor, the electro- 
pneumatic transducer, the DT transducer, the PT transducer and the mi- 
croprocessor control unit. The internal faults of the microprocessor control 
unit are detected autonomously by auto diagnostic procedures. This is the 
reason why control unit faults are not considered further and are not in- 
dicated in the set of faults given above. For the remote on-line diagnostics 
the following signals may be taken into account: U - set-point value, I - 
control current of the electro-pneumatic transducer, P - output pressure of 
the electro-pneumatic transducer X - actuator’s rod displacement, and F 
- volume flow rate signal. 

Algorithms used for the fault detection of the final control element may 
be divided into the following groups: 

(a) Algorithms based on relations between the signals. The relations may 
be expressed in the analytical, matrix or table form, or modelled with the 
application of neural and fuzzy models. Fault symptoms are detected after the 
evaluation of the differences between the real signal and the model outputs. 
Assuming the availability of the signals [/, /, F, X, F it is possible to 
obtain the set of the following residuals: 



n = P - P{l), 


1 

II 

CO 


r2=X-X{P), 


r7 = I-i(U,X), 


r3 = F-F{X), 


rs = P-P{U,X), 


T 4 = X- X{I), 


r,^X-X(U), 


r 3 =F-F{P), 


1 

II 

0 



(b) Algorithms based on the knowledge of the control element dead time 
values acquired experimentally by applying small changes of the set-point 
signal (typically 3 %): U F AU and U — AU. In this case the signals P, X 
and F are output signals. This method is applied in practice and the only 
additional signal that is needed is the actuator’s rod displacement signal X. 

(c) Algorithms based on the recovery time of the P and X signals ob- 
tained from the pulse responses of the system. 

(d) Algorithms based on simple heuristic relations between the measured 
signals. Those relations allow us to achieve appropriate diagnostic signals 
based on conformity tests between the signals and the directions of their 
changes, particularly in the end positions of the rod. Below, some examples 
of those tests for a normally closed valve are given. The index “0” denotes the 
nominal signal value related to full valve opening and the index “c” denotes 
the nominal signal value related to full valve closing. 
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I — lo and P < Po — AP 
I — lo and P > Po -h AP 
I — Iz and P < Pz — AP 
I — Iz and P > Pz AP 
A = and P > 0 -f AP 
|AP| > \AP^,n\ and |AX| 



- too low opening pressure, 

- too high opening pressure, 

- too low closing pressure, 

- too high closing pressure, 

- media flow by a fully closed valve, 

= 0 - rod displacement insensitive to pressure 
changes in the actuator’s chamber. 



Tests of this type may be applied to different combinations of the mea- 
sured signals. For the chosen flnal control element, a diagnostic analysis con- 
cerning the application of practicable tests was performed. An example of a 
set fulfllling these requirements is shown in Table 22.7. 



22.4.3. Fault isolation of the final control element 

Figure 22.19 shows the Fault Isolation System (FIS) designed for the flnal 
control element assuming the set of tests given in Table 22.7. The presented 



Table 22.7. Set of diagnostic tests for the final control element 



s 


Diagnostic test algorithms 


Si 


Si = < 


' -1 if n < -Ki, 

0 if ne{-KuKi), n = P-P{I), 
^ +1 if ri >K\, 


S 2 


S2 = < 


-1 if V2 < -K2, 

0 if r2e{-K2,K2), r2 = X-X(P), 
^ +1 if r 2 > K2, 


S3 


S3 = < 


-1 if rs < -Ks, 

0 if r3£{-Ks,Ki), r3 = F-F{X), 
[ +1 if T3 > Ks, 


S4 


S4 = 1 


f —1 if r4 < — P 4 , 

0 if r4G(-P4,P4), r4=X-X{I), 

[ -hi if V4 > P 4 , 


S5 


S6 = 1 


f -1 if rs < -Ks, 

0 if rse(-Ks,Ks), rs=X-X(U), 

[ +1 if rs > Ks. 
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Fig. 22.19. FIS for the pneumatic actuator-positioner-control valve assembly 



FIS is a reduct of an information system assuming all possible tests. For all 
detection algorithms given in Tab. 22 . 7 , triple- valued residual evaluation was 
used. This evaluation scheme allows considering the residual sign. In the given 
FIS there are the following elementary blocks: {/i}, {/2}, {/3,/e}, {/4,/s}, 
{/ 5 ,/ 7 ,/is}, {/ 9 ,/ie}, {fio}, {/ii}, {/12}, {/13}, {/14}, {/15}, {/l 7 ,/l 9 }- 
The faults in FIS elementary blocks are unconditionally indistinguishable by 
definition. 

The FIS system implies the following rules: 

(a) The rule for the state of aptitude: 

If (si = 0) n (s 2 = 0) n (s3 = 0) n (s4 = o) n (s5 = o) 
then the state of aptitude. 

(b) The rules for elementary blocks satisfying the relation of unconditional 
distinguishability: 

If (si = -1) n (S 2 = -1) n (S3 = 0) n (S4 = -l) n (S5 = -l) then /lo. 

If ((si = +1) u (si = -1)) n ((s 2 = +1) u (s 2 = -1)) n {s3 = 0) n («4 = o) 
n (S 5 = 0 ) then /i4. 

If (si = 0) n (S 2 = 0) n (S3 = 0) n (S4 = O) n ((S5 = +1) u (S5 = -l)) then /i5. 

If (si = 0) n («2 = 0) n (s3 = + 1 ) n (S4 = O) n (ss = +1) then /s U /e. 

(c) The rules for faults satisfying the relation of conditional distinguishablily: 

If (si = 0) n ((S 2 = +1) U (S 2 = -1)) n (S 3 = 0) n ((S4 = +1) U (S4 = -1)) 

n(s5 = -1) then /i U/ 4 U/ 8 . 
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If (si = 0) n (s 2 = -1) n (s 3 = -1) n («4 = -l) n (ss = -l) then /2 U /i 3 . 

If (si = 0) n (S2 = +1) n (S 3 = 0) n (S 4 = +1) n (ss = +1) then /4 U /s U /ii . 

If (si = 0) n (S2 = 0) n (S 3 = -1) n (S 4 = O) n («5 = o) then fsLIfrU fi7 
U fis U fig. 

If (si = -1) n (S2 = 0) n (53 = 0) n (S 4 = - 1 ) n (ss = - 1 ) then fg U /12 U fie. 

If (si = 0) n (S2 = +1) n ((S 3 = +1) U (S 3 = -1)) n ((S 4 = + 1 ) U (S 4 = -1)) 
n ((S 5 = + 1 ) U (S 5 = -1)) then / 13 . 

If (si = 0) n ((S2 = +1) U (S2 = -1)) n (S3 = +1) n ((S 4 = + 1 ) U (S 4 = -1)) 
n ((«5 = + 1 ) u (S 5 = -1)) then / 13 . 

If (si = 0) n ((S2 = +1) U (S2 = -1)) n ((S3 = +1) U (S3 = -1)) n (S4 = +1) 
n ((S6 = +1) U (S5 = -1)) then /13. 

If (si = 0) n ((S2 = +1) U (S2 = -1)) n ((S3 = +1) U (S3 = -1)) 
n ((S4 = +1) U (S4 = -1)) n (S5 = +1) then /13. 

If (si = 0) n (s 2 = -1) n (s 3 = 0) n ((s 4 = + 1 ) u (s 4 = - 1 )) n (ss = +i) 

then fill fa. 

If (si = 0) n ((s 2 = +1) u (s 2 = -1)) n (s 3 = 0) n (s 4 = -i) n (ss = +i) 

then /4 U /g. 

If (si = 0) n (S2 = 0) n (S3 = +1) n (S4 = 0 ) n (ss = 0 ) then fi7 U fig. 

If (si = +1) n (s 2 = 0) n (s 3 = 0) n ((s 4 = + 1 ) u (s 4 = - 1 )) 

n ((S5 = + 1 ) U (S5 = - 1 )) then /12. 

If ((si = +1) u (si = -1)) n (s 2 = 0) n (s 3 = 0) n (s 4 = + 1 ) 
n ((S5 = +1) U (S5 = -1)) then / 12 . 

If ((si = +1) u (si = -1)) n (s 2 = 0) n (s3 = 0) n ((s4 = +i) u (s4 = -i)) 

n (s5 = +1) then / 12 . 

If the premise of the rule is satisfied, then the proper rule generates a 
diagnosis pointing out faults of its conclusion part. In all other cases the 
diagnosis cannot be formulated. However, fault isolation based on those rules 
does not take into account symptom uncertainties. 

The application of the fuzzy evaluation of residuals and fuzzy inference 
mechanisms allows considering symptom uncertainties. However, the reason- 
ing rules are different than those presented in the previous section. In this 
case reasoning is based on rules referring to fault signatures (FIS columns) 
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complemented by the rule for the aptitude state. It is clear that the rules for 
unconditionally indistinguishable faults are equal. 

The set of rules is as follows: 

If (si = 0 ) n (s2 = 0 ) n (s3 = 0 ) n (S4 = O) n (ss = O) then then state 
of aptitude. 

If (si = 0 ) n ((S2 = + 1 ) u (S2 = - 1 )) n (S 3 = 0 ) n ((S 4 = + 1 ) U (S 4 = - 1 )) 
n (s5 = — 1 ) then /i. 

If (si = 0 ) n (s2 = — 1 ) n (s 3 = — 1 ) n (S 4 = — l) n (S5 = — l) then /2. 

If (si = 0 ) n (S2 = 0 ) n (S 3 = + 1 ) n (S 4 = O) n (ss = + 1 ) then fs. 

If (51 = 0 ) n ((s2 = + 1 ) u (52 = - 1 )) n (53 = 0) n ((54 = +1) u (54 = -1)) 

n ((55 = + 1 ) U (55 = - 1 )) then /4. 

If (si = 0 ) n (52 = 0 ) n (53 = - 1 ) n (54 = 0) n (55 = 0) then /s. 

If (51 = 0 ) n (52 = 0 ) n (53 = + 1 ) n (54 = O) n (55 = +1) then /e. 

If (51 = 0 ) n (52 = 0 ) n (53 = — 1 ) n (54 = 0) n (55 = 0) then fr. 

If (51 = 0 ) n ((52 = + 1 ) u (52 = - 1 )) n (53 = 0) n ((54 = +1) u (54 = -i)) 

n ((55 = + 1 ) U (55 = - 1 )) then /g. 

If (51 = — 1 ) n (52 = 0) n (53 = 0) n (54 = —1) n (55 = —1) then fg. 

If (51 = -1) n ( s 2 = -1) n (53 = 0 ) n (54 = - 1 ) n (55 = - 1 ) then /lo. 

If (51 = 0 ) n (52 = +1) n (53 = 0 ) n (54 = +1) n (55 = +1) then /n. 

If ((51 = + 1 ) u (51 = - 1 )) n (52 = 0 ) n (53 = 0 ) n ((54 = +1) u (54 = -1)) 
n ((55 = +1) U (55 = - 1 )) then /12. 

If (51 = 0 ) n ((52 = + 1 ) u (52 = - 1 )) n ((53 = + 1 ) u (53 = - 1 )) 

n ((54 = + 1 ) U (54 = - 1 )) n ((55 = + 1 ) U (55 = - 1 )) then /13. 

If ((51 = + 1 ) u (51 = -1)) n ((52 = + 1 ) u (52 = - 1 )) n (53 = 0 ) n (54 = 0) 
n (55 = 0 ) then /i4. 

If (51 = 01 ) n (52 = 0 ) n (53 = 0 ) n (54 = 0) n ((55 = +1) u (55 = -i)) then /15. 

If (51 = — 1 ) n (52 = 0 ) n (53 = 0) n (54 = —1) n (55 = —1) then fie. 

If (51 = 0 ) n (52 = 0 ) n ((53 = +1) U (53 = - 1 )) n (54 = 0) n (55 = O) then fn. 

If (51 = 0 ) n (52 = 0 ) n ((53 = +1) u (53 = - 1 )) n (54 = 0) n (55 = 0) then /is. 

If (51 = 0 ) n (52 = 0 ) n ((53 = +1) u (53 = - 1 )) n (54 = 0) n (55 = 0) then /19. 
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Example 22.3. 

(a) Let us assume that the following diagnostic signals appeared: 



Si = {{P,0),{+N,0),{-N,1)}, S2 = {{P,0A),{+N,0),{-N,0.e}}, 

S3 = {(P,l),(+iV,0),(-iV,0)}, S4 = {{P,0.1),(+iV,0),(-iV,0.9)}, 

S5 = {(P,0),(+iV,0),{-7V,l)}. 

The values of the symptoms S 2 , 54 and sq belong to different fuzzy sets. 
In Table 22.8 the degree of the membership of the symptoms to the values of 
the reference symptoms from the FIS table (Fig. 22.19) are given. The rules 
firing levels are determined applying the PROD/T, operator. 

From Table 22.8 one can obtain the following diagnosis: 



DGN= {(/lo, 0.32), {/g, 0.21), (/ 12 , 0.21), {fu, 0.21)}. 



Table 22.8. Conformity degrees of diagnostic signals with reference values 
from the FIS table 
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(b) Let us consider a particular case of symptoms assigned only and only to 
one fuzzy set. Let 

si = {(P, 1 ), (+iV,0), (-iv,0)}, S 2 = {(P,0), (+7V,0), {-N, 1 )}, 

S3 = {(P, 1), (+iV,0), (-iV,0)}, S4 = {(P,0), (+iV,0), (-iV, 1)}, 

S5 = {{P,0),(+iV,l),{-iV,0)}. 



In this case the diagnosis DGN — {(/ 4 , 0 . 5 ), (/s^O.b)} points out an equal 
certainty degree of each of two unisolable faults. For the given symptoms the 
faults pointed out by the diagnosis are distinguishable from fi and / 2 . If the 
value of the signal is given by 55 = {(P, 0), (-fAT', 0), (— A/', 1), then the 
faults /4 and fs would not be isolable from fi . 
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22.5. Fault diagnosis of the condensation power turbine 
controller tolerating instrumentation faults 

22.5.1. Condensation turbine controller 

Turbine rotational velocity control is the primary function of the condensa- 
tion turbine controller. The controller acts on a live steam mass flow by means 
of a set of hydraulic control valves. The controller input signals are used also 
for performing turbine protection tasks (Pawlak, 2002). A simplified block 
diagram of the turbine instrumentation is given in Fig. 22.20. The main tur- 









Boiler 

controller 



Yfz 



Turbine 

controller 



WG 
SST 
I Yi 



f p 

YoY, _ 




Fig. 22.20. Simplified diagram of the instrumentation of the condensation turbine; 
Notations: Z - set of control valves; S - set of actuators; T - power turbine; K 

- condenser; G - generator; SE - electric power system; F - live steam mass 
flow rate; pr ~ steam pressure; pi - hydraulic oil pressure; / - electrical power 
system frequency; G - power generator; n - turbine rotational speed; lo, Ti ~ 
external control set point signals; ARGM - frequency and power control system; 
WG - generator on-off switch; SST - turbine efficiency binary signal from the 
turbine diagnostic module; Yi - binary signal of power demand from ARGM] Yh 

- controller output; Ypz ~ auxiliary controller output for the boiler control unit 
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bine controller output signal Yh is fed into the electro-hydraulic transducer 
(pill): thus driving the positioners A controlling the set of control valves Z. 
The controller generates also an auxiliary control signal Ypz for the boiler 
control system. The outputs of the control system are the turbine power P 
and the rotational turbine speed n. Those signals are fed back to the main 
controller. The turbine operates in two modes. The first mode is switched 
into when the turbine is not synchronised with the electrical power system. 
Just before pulling the turbine into step the controller is switched into the 
turbine rotational speed controlling mode. In the second control mode the 
power signal replaces the rotational speed in control system feedback. The 
set point is fed into the controller (Yb,^!) from the national power load- 
dispatching agency. Live steam pressure and live steam pressure signals are 
also fed into the control unit. The list of the input signals of the condensation 
turbine is presented in Table 22.9. 



Table 22.9. Set of input signals and fault detection methods applied 
to the condensation power turbine 



Item 


Signal 


Symbol 


Unit 


Fault detection method 


1 


Turbo set power 


P 


MW 


Fuzzy neural networks 


2 


Live steam pressure 


PT 


MPa 


Fuzzy neural networks 


3 


Live steam mass flow rate 


F 


t/h 


Fuzzy neural networks 


4 


Rotational speed of the 
turbine 


n 


min“^ 


Hardware redundancy 
(voting 2 from 3) 


5 


Electric system frequency 


f 


Hz 


Test of correlation with 
the turbine rotational 
speed 

Threshold technique 


6 


Power set-point signal 


Yo 


MW 


Threshold technique 


7 


Power velocity set-point signal 


Yi 


MW 


Threshold technique 



Faults of the turbine component, actuators and instrumentation may cause 
an improper state of the turbine control system. The faults of instrumen- 
tation influence the control value, which may lead to unexpected changes 
of the turbine power. This may cause the necessity to stop the turbine. It 
is obvious that improper turbine behaviour is dangerous for human health 
and, of course, for the process itself. This situation brings also significant 
economic losses. 

The high reliability of contemporary condensation turbines has been 
achieved by applying hardware redundancy and the idea of fault tolerant 
systems. The idea of fault tolerant systems is based on the introduction of 
on-line diagnostics and reconfiguration of the control system structure or pa- 
rameters in faulty states. In recent years intensive research work has been 
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conducted also in the field of the application of analytical redundancy to di- 
agnostics and system reconfiguration. Patton (1997) and Blanke et al (2000) 
published survey papers in this field. A lot of contributions were devoted 
to applications of fault tolerant systems (Won-Kee Son et al, 1997; Candau 
et al, 1997). 

22.5.2. Diagnostics of instrumentation 

Fault detection concerning instruments is performed based either on fuzzy 
neural network models of turbine power, pressure and steam fiow signals or 
on simple algorithms that take advantages of the application of hardware 
redundancy or threshold controlling techniques. 

A significant advantage of Fuzzy Neural Models (FNN) is the ability to 
model non-linear processes. Huge real process data files are nowadays avail- 
able from DCS control systems commonly used in power plants. This gives 
the opportunity of modelling processes based on real process data and process 
knowledge. Process knowledge is used for defining qualitative models rather 
than quantitative ones. For turbine modelling purposes Horikawa MISO fuzzy 
systems (Horikawa et al, 1991) were applied in the form of the set of the fol- 
lowing i rules: 

Rii If xi is An n X 2 is A 2 i H • xn is A^i then y = y* 

The number of rules is equal to where Kj is the number of 

fuzzy sets assigned to the j-th input. Gaussian functions were used for the 
fuzzyfication of crisp inputs. Thus the membership functions of the xj input 
have the form 

(J-Aji (x) = exp - {w'l {xj 

The parameters (weights) Wc and Wg were used for defining the partitioning 
rules of the space of discourse. Wg is used for setting function wideness and 
the Wc parameter is an offset in the space of discourse Xj. The normalised 
fire level of the i-th. rule is given by 

ri ^^Aij i^j)i 

~ T,U(J-Aij{Xj)’ 
k j 

where the network output is given by 

n 

y* = ^hvi- 

i=l 

The following set of models was considered for the detection of instrument 
faults: 

Pt = Ft = f2{YH,_,,Ft-i), 

and Yt = hipr,-, , ^t-i, 
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On the basis of those models the following residuals were generated: 

ri = P - P, r 2 -F-F, and rs =Yh - Yh- 

Based on the model structure, the binary diagnostic table shown in 
Fig. 22.21 was applied. 





F 


P 


Pt 


ri 


1 


1 




r*2 


1 






r-3 






~Y 



Fig. 22.21. Binary diagnostic table 



Five Gaussian membership functions were assigned to every linguistic 
variable in the models. The models were trained applying back propagation 
methods. Separate data sets were used for model training and validation. 

Data acquired from the real plant were used for model tuning. The quality 
of modelling was estimated using the performance index J: 




\yi-yi\ 

yi 



X 100[%], 



where N is the training data set samples, is the model output, and yi is 
the measurement value. 

The results of modelling the turbine power are given in Fig. 22.22. The 
model quality performance index is sufficiently good (J = 0.28%). 

Figure 22.23 shows the turbine power model output and residuum in 
the case of a power transducer fault. Figs. 22.24 and 22.25 present system 
behaviour respectively in the case of a live steam mass flow transducer fault 
and a pressure sensor fault. The faults were introduced artificially into the 
real data stream by applying the track and hold technique. 

Residual sensitivity to faults is clearly visible in the graphs shown in 
Figs. 22.22, 22.24, 22.25. Some additional residual processing (filtering) 
is applied to reject the high frequency components of the residual sig- 
nal spectrum. Time-window moving averaging filters are very efficient in 
this case. 

The reconfiguration of the turbine control system is performed imme- 
diately after fault isolation. In Pawlak’s doctoral thesis (2002) all control 
system structures in the case of single instrumentation faults are presented. 
Table 22.10 gives the means of reconfiguration. 
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Fig. 22.22. Example of modelling the turbine power; Notations: 
F - live steam mass flow rate, P - measured power, P - power 
model output, and ri - power residuum 




Fig. 22.23. Process and model variables in the case of a power 
sensor fault simulation 
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Fig. 22.24. Process and model variables in the case of a live 
steam flow sensor fault simulation 



Table 22.10. Reconflguration measures in states with single 
instrumentation faults 



State 


Instrument fault 


Action in the case of fault isolation 


FI 


Turbo set power 


Switching into the manual control mode 


F2 


Live steam pressure 


Switching on additional “fast” power co- 
ntrol systems. Switching off the power 
limiting unit in the steam supply line. 
Setting minimal turbine velocity 


F3 


Live steam mass flow rate 


Switching off the Ym controller out- 
put signal driving power boiler 


F4 


Power set-point signal - lo 


Exchange of the set-point signal Yb on 
Pb (Fb - power set point signal con- 
trolled manually) 


F5 


Power velocity set-point signal - Yi 


Switching off the secondary control 
system of Yi 


F6 


Electric system frequency 


Switching the frequency control sys- 
tem into the rotational turbine speed 
control system 


F7 


Rotational speed of the turbine 


Switching off the faulty instrument 
and continuing the power control 
based on the remaining rotational 
speed transducers 
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The approved FDI system based on FNN may be applied as software re- 
dundancy support in fault tolerant systems, significantly increasing system 
reliability. Well-designed and sufficiently fast (low time-to-diagnosis) diag- 
nostics may be a key factor when considering high system immunity against 
faults. 



22.6. Summary 

Four chosen examples of industrial applications of fault detection and isola- 
tion were presented in this chapter. All FDI methods applied here are based 
on partial models of chosen parts of a technological installation. Models based 
on the relations between the process variables are very useful in this case. 
Global process models are not necessary, which significantly increases the 
practical value of those methods. Obtaining global models in practice is ei- 
ther impossible or too expensive. Different techniques are used for modelling 
purposes. Fuzzy, neural and fuzzy-neural models are most often used. Those 
techniques allow us to obtain the models of strongly non-linear processes, 
which is almost impossible when applying analytical approaches. It must be 
clearly underlined that diagnostic tests may be based on the application of 
a wide spectrum of models and are not limited exclusively to those obtained 
when applying soft computing methods. Of particular importance is the fact 
that the methods presented in this chapter in the phase of model learning do 
not need any data with real faults. This is an important practical advantage 
because only in rare cases are such data available, particularly if we focus our 
attention on refinery or atomic power industries. 
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Chapter 23 



DIAGNOSTIC SYSTEMS 

Jan Maciej KOSCIELNY*, Pawel RZEPIEJEWSKI*, 
Piotr WASIEWICZ* 



23.1. Introduction 

In the last few years, there has been observed a rapid growth of interest in 
industrial applications of diagnostic systems. It results from potentially high 
economic profits which can be brought about by industrial applications, as 
well as from the rise of a new generation of control systems which facilitate 
the application of advanced software tools to the supervisory control and 
diagnostics of industrial processes. 

Operators’ incorrect decisions, as well as faults of sensors, actuators and 
components of technological installations of the power, chemical, steel, food 
and other industries occur inevitably, in spite of the use of high-reliability de- 
vices. They cause significant and long-lasting disturbances of the production 
process, which lead to a decrease in its efficiency or can even stop the process. 
Economic losses in such cases are very high. Some faults evoke emergency sit- 
uations, e.g., a destruction of technological installations or contamination of 
the natural environment. They can also be dangerous for human life. Apart 
from catastrophic faults, incipient faults must also be considered. They can 
cause many problems because they may be unnoticed by the staff. 

The issues of process diagnosis and process protection become more and 
more important in this situation. Thus, computer-aided systems, which advise 
operators during the diagnosing process or generate diagnoses automatically, 
are becoming more and more significant. 

Various types of diagnostic systems are described in this chapter. Alarm 
systems used in control systems are characterized. These systems are the 
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simplest solutions of diagnostic systems for industrial processes. The main 
part of the chapter is dedicated to diagnostic systems of industrial process- 
es. The DIAG system developed at the Institute of Automatic Control and 
Robotics of the Warsaw University of Technology is presented, among other 
things. Diagnostic systems of actuators are characterized in the final part of 
the chapter. 



23.2. Alarm systems in control systems 

All industrial control systems are equipped with alarm indicating systems, 
which are simple versions of diagnostic systems, advising process operators. 
Control systems facilitate the detection and indication of the following process 
alarm types: the exceeding of the process variable alarm limits, the exceeding 
of the process variable value change rate, and the exceeding of the control 
error values and the incorrectness of states of binary process variables. The 
above-mentioned means facilitate significantly the alarm analysis and identi- 
fication of fault reasons. However, the diagnostic functions are in the process 
operator’s hands. 

The standard view of an alarm table presents alarms in a layout reverse 
than the chronological one. The last alarm is displayed at the top of the list, 
and then there come the earlier ones. Such a list commonly consists of the 
exact time of fault occurrence, its code, the process variable or the control 
loop code, the alarm description and the alarm state (existing or cancelled). 
Moreover, the alarm line blinking in the synopsis and a sound signal are used 
till the alarm is acknowledged by the operator. An example of a printout of 
an alarm list is shown in Fig. 23.1. 

Alarm information is also blended in a hierarchical structure of pictures 
presenting the process data and the automatic control system data. The 
detailedness of the data increases with a decrease in the scope of the visualised 
object part. The typical pictures present the complete process, technological 
nodes and particular parts of the technologic apparatus. Process variable 
values that are in the state of alarm are usually displayed in red. The alarms 
are also visible in the pictures of process variable groups and the pictures of 
control loops, as well as the pictures of particular process variables or control 
loops. The trends of process variables facilitate the analysis of alarms and 
emergency situations. 

The alarm filter mechanism, whose task is to adapt the alarm stream to 
particular system users, fulfils a very important role in the alarm system. 
Different alarm subsets are interesting for the system and plant managers, 
process operators, technicians, automatic control and power engineers, etc. 
The alarm stream filtration is used for alarm distribution purposes. It consists 
in an appropriate alarm selection carried out on the basis of the declared 
alarm types, priorities, technological nodes, etc. 




23. Diagnostic systems 



905 





H‘stDry 1 


1 AckmwTatige View I Alarm Viewer 


Alarm Ba 


nner j On 












6rQiMpN4n» 


j TaaNwe 


1 Alatm Message 


IniiHiTins ' lAclnPvVlBdoeTme i 




WARNING 


T57 03 


5th EVAPORATOR STATION Juice temperature is too high 


C1727703 1 5 45 49 






WAPN'NG 


F74_03 


4ln BOILER STATION- Steam flow is too high 


01727/03 1545:04 






WARN NG 


F74J3 


4th BOILER STATION: Steam is loo hij^i 


01/27/0315.44:28 






V/APN'NG 


T53_03 


MEAT BOILER Juce lsrperat.i;e s too high 


01/27/03154413 






WARNING 


T53 03 


HEAT BOILER: Jiice iemperatiKe is loo hk|h. 


01/27/03154410 






CRITICAL 


FCb7 03PV 


5th EVAPORATOR STATION Juice flow is too low 


01/27/03 15‘43:52 






WARNING 


T57.03 


5th EVAPORATOR STATION Jwce terrperalu-e is too high 


01/27/03154316 






WARNING 


T57 03 


5th EVAPORATOR STATION Jisce lerrper.^uie is too high 


01/27/03 1542 53 






WARN'NG 


P57_04 


VALVE 57 Outlet pressure is loo low 


01/27/03154243 






WARN NG 


F57_03 


VALVE 57 Inlet p-essiie is too low. 


01/27/0315 42 37 






V/ARNiNG 


PC51 01 


Isl EVAPORATOR STATION Stoa-n pressj-e stoohc^ 


01/27/031542.15 






CRITICAL 


LC51 03FV 


1st EVAPORATOR STATION. Juice level is too low. 


01/27/031542:01 






CRITICAL 


LC51 03PV 


1st EVAPORATOR STATION Juice level is too low 


01/27/03 1541:04 






WARNING 


P51„05 


VALVE 51 Inlet pressure is loo low 


01/27/031539 33 






WARNING 


LC51_03X 


VALVE 51 Servomotor tod dsplacement is III open, 


01/27/031538 55 






WARN NG 


F51.01 


1st EVAPORATOR STATION In et llovv is too low 


01/27/031538 37 






WARN'NG 


P5L06 


VALVE 51 Outlet p-essure is loo low 


Cl/27/031538 22 






WARNING 


P5L05 


VALVE 51 Inlet p essue is Ico low 


01/2//031538 16 






WARNING 


T74^30 


4ih BOILER STATION Water temperature is loo high 


01/27/031537 26 






1 CftmCSAL 


i£74J0PV 


BOILER STATION WAei levd is too low. 


01/27/03 1536 43 01/2 


7/03 1537 4B 




jwARNNG 


R74^30 


4th BOILER STATION Steam pressure is too hgh 


01/27/03 1&3S 13 01/2 


7/03l536'0$ 




1 WNRWNG 


U:74„2<>< 




01/27rtQ 1534,51 01/2 


7/031Sc3S05 




Ccr.'ieaedflocaliort) 











jij" 



Fig. 23.1. Example of an alarm list 



The disadvantages of alarm systems are as follows: 

• a high number of alarms, indicated within a short period of time, in 
the case of the occurrence of dangerous faults, which results in information 
overload for the operators, 

• a lack of the possibility of detecting many parametrical faults, as well 
as delay in fault detection resulting from the exclusive use of simple control 
methods of process variable limits, 

• a lack of conclusion mechanisms ensuring the possibility of formulating 
a diagnosis, 

• inconveniences of the alarm presentation manner: a fault usually man- 
ifests itself as the occurrence of many alarms in different synopses and stan- 
dard pictures. However, alarms that result from different faults can be indi- 
cated simultaneously in the same synopsis. 

All of that makes diagnosis generation difficult for process operators, i.e., 
it complicates the identification of the cause of alarm set occurrence, which 
is necessary in many cases for undertaking an appropriate protecting action. 
Thus, the preparation of an adequate diagnosis depends only on the knowl- 
edge, experience and the mental and physical state of the operator. 



23.3. Diagnostic systems for industrial processes 

The imperfection of alarm indicating systems, as well as the necessity of rapid 
and precise recognition of incorrect and emergency situations had resulted in 
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research concerning the design of diagnostic systems for industrial processes. 
The latest computer technology facilitates the use of methods which ensure 
earlier fault detection than it is possible in alarm systems. They also facilitate 
the isolation and even identification of occurring faults. 

The scope of on-line diagnostic functions can cover: 

• fault detection with the indication of detected symptoms, 

• fault isolation, 

• fault identification, 

• registration of diagnostic data, 

• diagnosis justification, 

• advising in dangerous situations. 

Continuous process control related to fault detection is carried out on the 
basis of process variable values. Various tests that can detect faults are carried 
out to this end. The tests have the form of software algorithms. Diagnostic 
signals are the outputs of the tests. The detected fault symptoms are usually 
indicated by alarms. Any symptom can be caused by several or even a dozen 
different faults. Therefore, symptom occurrence does not determine exactly 
a specific fault. 

Different methods are used for fault detection. The most advanced fault 
detection methods are based on analytical, neural or fuzzy models of partic- 
ular process parts. Diagnostic signals are generated as the assessment result 
of the discrepancy between the real and modelled signals. 

Fault isolation and fault identification are carried out on the basis of test 
results (diagnostic signals). Knowledge about the relations existing between 
diagnostic signals (test results) and faults is essential for fault isolation. The 
diagnosis indicating the recognised faults is the result of this type of be- 
haviour. In the case of fault identification, the diagnosis defines also the size 
of the fault (e.g., the estimation of the leakage value). Faults may not always 
be recognised precisely (unambiguously). Information about the faults may 
be transferred directly to the maintenance personnel. It can also be used by 
the system to advise the process operators, giving them appropriate instruc- 
tions for emergency situations. 

Exact and quickly obtained diagnoses allow us performing necessary pro- 
tecting actions. Thus, diagnostic systems together with protecting actions 
form the second, higher level of process protection, while the classical tech- 
nological blockades and protection systems carry out the lower level tasks. A 
higher process protection layer, thanks to exact and quickly obtainable di- 
agnostic information, facilitates the limitation or elimination of fault results. 
It is possible to avoid the activation of the lower level protection, which in 
many cases can be a reason for a process shut-down or a reduction of its 
efficiency. 
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The first diagnostic systems of industrial processes were isolated solu- 
tions developed as research results. They were usually developed for specif- 
ic processes, so they were not universal. Such solutions are presented, for 
example, in (Cassar et al, 1994; Jummo and Parkkinen, 1991). Currently, 
diagnostic systems are dedicated to specific object classes, e.g., the boiler of 
a power plant (Betz et aL, 1992; Schlee and Simon, 1994; Wetzl, 1994), or 
to more universal systems, which could be applied after an appropriate con- 
figuration to different industrial branches (Theilliol et al, 1997; Koscielny 
et al, 2000). 

Diagnostic systems are commonly developed on the basis of real-time 
skeleton expert systems, such as G2 (Halme et al, 1994; Betz et al, 1992; 
Schlee and Simon, 1994; Szafnicki et a/., 1994) or Nexpert Objects (Lee et 
aL, 1992). However, many systems are developed from scratch. 

Well-known automatic control firms started with the introduction of di- 
agnostic systems as extensions of large decentralised control systems. The 
following ones can be given as examples: the ABB MODI system (Schlee 
and Simon, 1994), dedicated for cooperation with the PROCONTROL P 
and Advant OCS systems, the Siemens KNOBOS system (Wetzl, 1994), 
offered for the Teleperm XP system, as well as the Honeywell AMS sys- 
tem (Abnormal Situation Management), being an element of the Total 
Plant Solution system. On-line diagnostic systems are also developed as 
extensions of universal supervisory control and visualisation systems (Cas- 
sar et aZ., 1994; Lautala et a/., 1991; Koscielny et a/., 1992, Hajha and 
Lautala, 1997). 

Out of all existing diagnostic systems, the following ones can be mentioned: 

• MODI, ABB, (Betz et al, 1992; Schlee and Simon, 1994), 

• KNOBOS, Siemens, (Wetzl, 1994), 

• DAMATIC XD, Valmet, (Hartikainen, 1994), 

• DIALOGS, designed by five industrial and three academic centres with- 
in the framework of the EUREKA program, (Theilliol et a/., 1997), 

• TIGER (Milne and Treve-Massuyes, 1997), 

• EFTAS (Nold, 1991), 

• SEXTANT (Lore et al, 1994), 

• ASM, Honeywell, 

• DIAG (Koscielny et a/., 2000). 

As an example of a diagnostic system, the DIAG system (Koscielny, 1999; 
Koscielny et al., 2000) is presented below. The system has been developed at 
the Institute of Automatic Control and Robotics of the Warsaw University 
of Technology. 
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23.4. DIAG system for diagnosing industrial processes 

The DIAG diagnostic system is designed for early detection and exact isola- 
tion of faults of measuring devices, actuators and components of technological 
installations in chemical industry, power industry, sugar factories, etc. The 
DIAG system is adapted to cooperation with different Decentralised Con- 
trol Systems (DCS), as well as Supervisory Control And Data Acquisition 
(SCAD A) systems. A general functional structure of the system is shown in 
Fig. 23.2. 




Fig. 23.2. General structure of the DIAG system 



The DIAG system, on the basis of the values of process variables and 
control signals acquired from the DCS or SC ADA systems, carries out the 
following tasks: 

• fault detection, 

• fault isolation, 

• the graphical presentation of diagnoses in process synopses, graphs, 
sheets, charts, diagrams, 

• the preparation of diagnostic reports, 

• diagnosis justification in the form of the presentation of the set of 
symptoms detected during the conclusion process, 

• storing archival diagnoses, 

• supporting protecting decisions, 

• building fuzzy and neural models of process parts for fault detection 
purposes. 

The following fault detection methods are used in the DIAG system: 

• methods based on fuzzy and neural models. 
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• methods based on physical equations, e.g., balance ones, 

• methods based on linear process models, 

• heuristic methods based on relations existing between signals, e.g., 
checking known relations between the values of process variables, checking 
the trends of process variable values, 

• classic methods, such as checking the trends, control errors, conformity 
of value changes of the control signal and the feedback signal, a comparison 
of redundant signals, checking alarm limits, etc. 

The selection of fault detection methods depends on the possessed knowl- 
edge about the process. A recommended selection approach is the use of 
partial models of particular units of a technological installation. The set of 
partial models should cover the whole process. Fuzzy and neural models as 
well as models based on physical equations are particularly useful. Their main 
advantage is that they can be also applied to non-linear processes. 

The DIAG system includes a module for building fuzzy and neural models 
of technological installation parts. It uses the following modelling methods: 

• multi-layer perceptron networks, 

• fuzzy-neural networks, 

• Takagi-Sugeno-Kang fuzzy-neural networks, 

• Wang-Mendel fuzzy models. 

Such models are built on the basis of experimental data, measured during 
the normal process operation (without faults) with the use of different train- 
ing methods. The simpler heuristic relations existing between variables can 
be used for fault detection in the case when it is difficult to obtain process 
models. Such knowledge is often accessible in process documentation or tech- 
nical literature. It can also be obtained from technicians, process operators, 
and service staff. 

The DIAG system is equipped with tests designed individually for each 
particular installation. The result of each test is a diagnostic signal. Fault 
isolation is carried out on the basis of all diagnostic signals generated on-line 
by the DIAG system. 

Fault isolation in the DIAG system is carried out on the basis of the F- 
DTS method, which is described in Chapter 18. It uses fuzzy logic approach 
for residual value estimation and diagnostic conclusion. The relation between 
faults and symptoms is defined by experts and stored in the form of an infor- 
mation system. Generated diagnoses indicate faults and certainty coefficients 
of their occurrence. Appropriate subsets of possible faults and diagnostic sig- 
nals are created at each stage of the diagnosis formulation process. These 
diagnostic signals are necessary for isolating the faults. 

The fault detection and isolation methods applied ensure the possibility 
of a continuous development of the DIAG system together with an extension 
of the knowledge about the process. The DIAG system is also characterised 
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by a resistance to changes occurring in the set of accessible measurement 
signals. Such changes can result from earlier faults or can be caused by in- 
tentional device switch-offs carried out by process operators. In accordance 
with these changes, the DIAG system performs adequate modifications in 
the set of active tests. It can have an unfavourable influence on the achieved 
diagnosis accuracy, but eliminates the possibility of introducing faults during 
the conclusion process. 

Diagnosis justification consists in presenting the reasoning process which 
supplied the diagnosis. It enables system users to verify the correctness of 
reasoning. Appropriate advisory instructions can be designed that support 
process operators in the case of the occurrence of particular faults. 

The DIAG system software structure is presented in Fig. 23.3. The system 
modules are written in the C++ language. They can function in a distributed, 
heterogeneous software environment. The present system version has been 
tested in the Windows NT 2000 environment. 




Fig. 23.3. DIAG system structure 



Two main tasks of the DIAG system are carried out by the FDM-DIAG 
Fault Detection module and the FIM-DIAG Fault Isolation module. The 
WMMT-DIAG and NFMT-DIAG modules are designed for building fuzzy 
and neural process models. Such models are created in an interactive mode. 
The process variable values are the inputs of both modules. Each stage of 
model building is illustrated by diagrams and can be interrupted at any 
moment for inserting or modifying any model parameter. Then, the prepared 
model files can be included in the fault detection module, extending the data 
base of the diagnostic tests. 

The VM-DIAG visualisation module presents graphically the current val- 
ues and trends of residuals, as well as diagnostic signals and generated di- 
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agnoses. The most important application features of the VM-DIAG module 
are as follows: grouping all diagnostic information in a homogeneous graph- 
ical interface, system independence from local SCADA or DCS systems, as 
well as the ability of integrating different visualisation technologies specific 
to diagnostics. Information about generated diagnoses can be directed to an 
automatic control system (DCS or SCADA), and the graphical visualisation 
of diagnoses can be carried out there independently. 

The DIAC system ensures the possibility of diagnosing complex installa- 
tions. The system research has been carried out at the Lublin Sugar Factory 
and the Siekierki- Warsaw Power Plant. An example of modelling results of 
two control valves of injection water in the power plant are shown in Fig. 23.4 
(a permanent water flow through the valve) and Fig. 23.5 (the valve closed 
for long periods of time). The flow of injection water has been modelled on 
the basis of input signals: the control signal (controller output) and the water 
pressure before the valve. Fully satisfactory fuzzy and neural models of both 
control valves have been achieved. 




Fig. 23.4. Modelling results of the injection water flow 
(the valve with a permanent water flow) 



An example of modelled and real trends of the flow signal with a simulated 
fault of the measuring loop is shown in Fig. 23.6. The fault is successfully 
detected. 

The way of visualizing diagnoses is shown in Fig. 23.7. The coefficient of 
the certainty of fault appearance is presented for each fault. If the coefficient’s 
value is very high, then the bar-graph presenting this value is red. For lower 
values, its colour is yellow, and for values close to zero, it is white. 

Currently, the DIAC system application is prepared at the Pulawy Nitro- 
gen Works for diagnostic purposes of the Isobaric Double Recycling (IDR) 
Section of the Urea Manufacturing Plant. 
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Fig. 23.5. Modelling results of the injection water flow 
(the valve closed for long periods of time) 




Fig. 23.6. Example of the fault detection of the flow measuring loop 



23.5. Diagnostic systems for actuators 

Actuator diagnosis can be carried out periodically for service purposes, or 
on-line. The main tasks of the on-line diagnosis (state monitoring) are ear- 
ly detection and signalling of faults. The tests carried out should not dis- 
turb the running process. Therefore they should rely on the values of the 
available process variables. The use of small test disturbances, which im- 
perceptibly influence the process, are permissible in some cases. The service 
diagnostics is carried out mainly after the shut-down of the process, or during 
the process operation, after opening the pipe-bypass and after closing valves 
which cut off the part of pipeline including the control valve. Any test distur- 
bances can be used in such cases. Usually, valve characteristics are tested in 
a complete scope. 
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Fig. 23.7. Synopsis of the VM-DIAG module showing the 
detected symptoms and isolated faults (for the steam instal- 
lation of the boiler at the Siekierki- Warsaw Power Plant) 



Specialized diagnostic systems are developed for intelligent actuators. In 
most cases, their functions are limited to remote service diagnostics. However, 
these systems nowadays realize more often the tasks of on-line fault detection. 
A dynamic development of such systems is expected in the nearest future. The 
on-line diagnosis can also be carried out locally with the use of programmable 
controllers, which are parts of intelligent actuators (Bartys and Koscielny, 
2001, 2002; Koscielny and Bartys, 2000). 

The remote diagnosis of actuators can be carried out exclusively on the 
basis of signals transferred to the supervisory system. Analogue devices do 
not have many such signals. It decreases the possibility of fault isolation. 
For example, a set consisting of a pneumatic servo-motor and a control valve 
that is installed in the flow control system has only the input signal (from the 
controller) and the signal of the controlled flow. In the case of control systems 
of other process variables, the diagnosis becomes possible if the measurement 
of the servo-motor rod displacement is accessible. 

The abilities of diagnostics increase significantly in the case of modern, 
intelligent actuators, which are equipped with many additional sensors. For 
example, the FieldVue positioner manufactured by Fisher- Rosemount trans- 
fers the values of such signals as the input (control signal), the servo-motor 
chamber pressure, the servo-motor rod position, the temperature inside the 
positioner, the end positions of the servo-motor rod to the supervisory sys- 
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tern. On the basis of such signals and alternatively using the flow signal, fault 
detection, as well as fault isolation, is possible. 

Diagnostic systems for intelligent actuators are offered by several manu- 
facturers. Fisher- Rosemount AMS, and Samson Trovis-Expert can be men- 
tioned as examples of such systems. These systems are connected with in- 
telligent positioners trough industrial networks such as Fieldbus Foundation 
HI, Profibus PA, or HART. They facilitate also remote configuration and 
calibration of actuators. The main aim of diagnostics is to make a deci- 
sion about the repair of these devices. The service cost can be reduced, be- 
cause the device condition can be tested without unnecessary disassembling of 
the valve. 

The local actuator diagnosis requires the use of microprocessors having a 
high computational capacity. Currently, diagnostic functions in modern posi- 
tioners are reduced to simple algorithms of fault detection and the evaluation 
of device wear. The test of consistency between the control signal and the 
position of the piston rod, the detection of the exceeding of control signal 
boundaries, the counting of cycles and the movement of the rod with the 
signalling of the exceeding of acceptable values can be mentioned as typically 
used diagnostic tests. 

One can expect that the range of diagnostic functions of intelligent posi- 
tioners will be constantly increasing and will Anally include fault detection 
based on models identified in the preliminary operation phase, as well as the 
isolation of faults. It will also be possible to combine the concepts of the 
local and global actuator diagnosis by using appropriate distribution of the 
functions among the local and global levels. 

The V-DIAG system (Bartys and Koscielny, 2001; 2002; Koscielny and 
Bartys, 2000), being an example of a diagnostic system for actuators, is pre- 
sented in Chapter 23.6. The V-DIAG system is a specialised version of the 
DIAG system. It has been developed at the Institute of Automatic Control 
and Robotics of the Warsaw University of Technology. 



23.6. V-DIAG diagnostic system for actuators 

The V-DIAG diagnostic system includes algorithms designed for the detection 
and isolation of actuator faults. It is also equipped with tools for monitoring 
the wear degree of actuator elements. The V-DIAG system carried out the 
diagnostic tests on-line, on the basis of the process variable values acquired 
from the control system. Communication between the diagnostic system and 
the control system takes place with the use of the OPC Client tool. 

Diagnostic tests consist in comparing process variables with the corre- 
sponding valid variables obtained out of partial models of the diagnosed de- 
vice. The models are identified during the first phase of the system operation, 
or they are created on the basis of the available archival data. 
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Fig. 23.8. Diagram of data acquisition and data flow in the V-DIAG system 



III 

Systwn Observer 

ril£7''^Wal 




Fig. 23.9. Results of valve modelling at the Lublin Sugar Factory 
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The detection of a fault of the valve can be signalled to the operator, e.g., 
in the form of a valve colour change in the synopsis. A detailed diagnosis is 
accessible in the diagnostic system window (Fig. 23.10). Together with the 
information about the type of fault, the coefficient of the diagnosis certainty 
(height and colour of the bar-graph) is also given. 
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Fig. 23.10. Visualisation of diagnoses in the V-DIAG system 



In the case of fault indication, comparing the trends of process variables 
with the trends of variables obtained from the models, the operator can esti- 
mate the size of the fault. In the case of parametric faults (long-term faults, 
growing up for a long time) e.g., the erosion or sedimentation of the valve 
head and valve seat, the monitoring of the fault progress speed, as well as 
repair planning, is possible. 

In the field of actuator monitoring, the V-DIAG system counts the num- 
ber of cycles and evaluates the total servo-motor rod displacement. More- 
over, statistical operation parameters of the actuator, e.g., the most frequent 
area positions of the servo-motor rod, as well as the amplitude of its dis- 
placement, are calculated and monitored on-line in the form of histograms. 
That facilitates the estimation of the wear degree of actuator elements, 
as well as the assessment of the correctness of valve selection and control 
settings. 
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Fig. 23.11. Visualisation of valve modelling - an abrupt fault 







Fig. 23.12. Visualisation of diagnostic tests results - an abrupt fault 
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23.7. Summary 

In the last years there has been observed a rapid development of decentral- 
ized control systems. A new generation of systems, having an open structure 
due to the use of LAN and Fieldhus networks and the standard operation 
systems, was created. Beside the development of both the software and hard- 
ware standards, one of the most important areas of competition between the 
manufacturers offering systems for large technological installations are algo- 
rithms of advanced control, optimisation and process diagnostics. Presently, 
they often use artificial intelligence techniques, such as fuzzy logic, artificial 
neural networks, genetic algorithms and expert reasoning. The software is 
designed for specific objects, most often in the chemical or power industries. 

The development of automatic control can also be observed in the domes- 
tic industry. Modern control systems are installed in most of big companies 
of the power or chemical industries. DCS control systems and SC AD A mon- 
itoring systems collect and archive the values of process variables, which can 
be used later to build models necessary to control, diagnose and optimise 
processes. This facilitates putting into practice advanced diagnosis solutions. 
The industry is also interested in such applications. This is due to potentially 
high benefits which can be obtained from the use of such systems. 

One can predict that in the nearest future the development of diagnostic 
system applications as well as advanced control and optimisation will be in- 
tensified in different plants all over the world, such as heat and power plants, 
power plants, chemical plants, pharmaceutical plants, petrochemical plants, 
gas pipeline networks, water supply systems, steelworks, sugar factories and, 
many others. One can also point out that the application of diagnostic soft- 
ware should be preceded by putting the advanced control and optimisation 
algorithms into practice. Diagnostic systems require full credibility of the 
measured signals used. Therefore, the on-line diagnosis of measurements is 
the main task of diagnostic systems. 

An increase in process safety, the elimination of dangers for the envi- 
ronment, the reduction of economic loses caused by damages, as well as the 
elimination of situations when process operators are overloaded with alarm 
information should be the results of the use of on-line diagnostic systems 
supporting process operators. 
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